REMARKS 



I. Explanation of Amendments and Interview Summary 

The Applicants acknowledge with thanks the courtesy extended by the 
Examiner to the applicant's attorney David A. Gass during a personal interview on August 2, 
2007, during which time the rejections in the outstanding Office action were discussed. The 
Applicants proposed the presentation of a meta-analysis of data pertaining to the invention 
and the Examiner encouraged the Applicants to present such an analysis in the form of a Rule 
132 declaration. 

The claims were amended to remove reference to "stroke" in order to expedite 
allowance of a preferred embodiment, and not for reasons related to patentability. 

The final clause of claim 61, pertaining to interpretation of a the screening 
results where the haplotype of interest is absent, was amended to clarify that "the absence of 
the haplotype in the nucleic acid of the individual identifies the individual as not having the 
elevated susceptibility to MI due to the haplotype . It will be self-evident that any screening 
test leads to one conclusion if the test is positive and another conclusion if the test is negative. 
The final clause of the claim as originally presented refers to the relative risk from the tested 
haplotype and a person of ordinary skill would have interpreted the claim as drawing a 
conclusion that that the individual has no elevated susceptibility to MI from other, untested 
factors. That, however, appears to be a conclusion drawn by the Patent Office, resulting in a 
rejection alleging lack of enablement. The current amendment further clarifies a conclusion 
that would be reached for a subject that is found not to carry the haplotype. Any further fine- 
tuning of this clause should be amenable to resolution by telephonic interview because the 
Applicants and the Patent Office appear to intend the same scope and meaning for this 
element of the claim. 

II. Remarks Relating to the nature of the invention 

The Applicants continue to dispute the Patent Office's characterization of the 
claimed invention. The invention relates to a method for assessing susceptibility to 
myocardial infarction that involves analysis of nucleic acid sequence in a person's FLAP 
gene. The elected claims are not drawn to polynucleotides, to polymorphisms, or to 
"differences," even though the Patent Office's patentability search may involve looking for 

4 



such features in the prior art. Rather, the claims are drawn to methods that involve analyzing 
a human individual's DNA at a particular locus. The results of the analysis determine 
whether or not the individual is scored as having elevated risk for myocardial infarction. 
There is common utility for all variations of this method that are described in the application. 

III. The Rejection Under 35 U.S.C. § 112, First Paragraph, Alleging Lack of 
Enablement Should be Withdrawn 

In paragraph 6 the of the Office action the Patent Office rejected claims 61 and 
63-66, alleging lack of enabling disclosure. The Applicants traverse this rejection. 

The Applicants repeat by reference arguments made in their previous 

submissions. 

The rejection was based in part on alleged overbreadth insofar as the claims 
encompassed assessing susceptibility to both MI and stroke. Reference to stroke has been 
deleted by amendment, rendering moot this basis for rejection. 

Most of the remaining discussion of the issue of enabling disclosure in the 
Office action focuses on whether or not the association between FLAP haplotypes and MI 
taught in the application is reproducible. Accompanying this amendment is a sworn 
declaration summarizing a meta-analysis conducted by deCODE genetics, the assignee of this 
application. The meta-analysis shows that the correlation between FLAP haplotypes and MI 
is indeed reproducible. 

Importantly, the meta-analysis includes data from the Zee study cited by the 
Examiner as well as other published studies with available data analyzing the correlation 
between the FLAP haplotype and MI. The meta-analysis is an aggregation of data from 
smaller studies and shows that the correlation between the FLAP haplotype and increased 
risk for MI is real and statistically significant. The statistical power of the meta-analysis is 
much greater due to the large sample size and is much more probative that any individual 
smaller study that did not necessarily detect the correlation due to small sample size. 

The Patent Office analyzes various articles or studies that pertain in some 
fashion to gene-disease correlation, but that are unrelated to the subject matter of the claims 
(e.g., Mayer et al., SNP's in the CADPKL gene and neurological disorders). These studies 
have no probative value with respect to FLAP -MI correlation, especially in view of the 

5 



abundant data now available pertaining to FLAP-MI. The Applicants pointed to numerous 
defects in articles cited by the Patent Office or their failure to support the proposition for 
which they were cited, yet the PTO has continued to rely on the articles without addressing 
the defects. 

The Patent Office alleges (section titled "Guidance in the Specification") that 
the specification provides no evidence that the invention can be practiced "as broadly 
claimed" with respect to individual markers. This aspect of the Office action has no 
relevance to the elected claims, which pertain to a particular four-polymorphism haplotype. 
The haplotype of the claim is shown to have a statistically significant correlation with 
increased risk of MI in the application and in the meta-analysis referred to above. 

The Patent Office raised concerns about the proper wording of the claims as 
they pertain to individuals that do not have the tested-for HapA FLAP haplotype. These 
concerns are addressed above, and do not give rise to any questions of enabling disclosure. 

The Patent Office's final concern appears to relate to ethnic or "inter-ethnic" 
variability conferring different risks. Even if true, such variability does not give rise to 
questions of statutory enablement for the present claims. Statutory enablement involves 
whether an application describes an invention in a manner that allows those of ordinary skill 
to practice the invention. The present application teaches a person of ordinary skill how to 
perform the haplotype screen without regard to ethnicity, and teaches the conclusion that can 
reasonably be drawn from it based on population genetics. As with other correlation tests, 
the results provide helpful information for medical treatment or lifestyle management, and . 
are indicative of risk at the population level. The present invention is appropriately claimed 
insofar as an individual is assessed for one type of data (FLAP haplotype) and a conclusion 
about susceptibility (supported by statistically validated data) is drawn based on the FLAP 
haplotype assessment only. The conclusion does not require ethnicity data. 

Human variability is the rule, not the exception, for all aspects of medicine, 
including diagnostic tests based on biochemistry; safety of drugs; efficacy of drugs; 
susceptibility to diseases; life expectancy, and so on. While it may be possible or desirable to 
refine any medical test or treatment or other medical procedure to an ethnicity or sub- 
ethnicity, that is not the current state of medicine and is not part of the statutory requirement 
of enablement for a claim that does not require a conclusion based on ethnicity. The 

6 



Applicants previous amendment cited many examples of diagnostic tests that are considered 
medically useful, even though their predictive value with respect to any particular person is 
not considered a certainty. The data in the application and the larger meta-analysis show that 
the test is valid and useful and provides another tool for assessing risk for ML 



relative risk of three or more before accepting a paper for publication. As explained in 
greater detail in the declaration filed herewith, RR > 3 is clearly NOT the standard in the field 
for accepting publications. (The inventor's paper was published in prestigious Nature 
Genetics without RR > 3.) Nor is RR > 3 descriptive of the risk that would reasonably be 
attributed to multi-factorial diseases. Nor is RR > 3 a relevant indicator for statutory 
enablement. A conclusion of enablement is appropriate in view of the fact that the 
incremental risk for MI associated with FLAP Hap A, though not nearly as high as 3.0, has 
been shown through large studies and meta-analysis of multiple studies to be statistically 
significant, and not an artifact of a small study. 



the term "co-inventor" when "inventor" should have been used. The Applicants apologize 
for any confusion caused by this typographical error. In addition, the filed Helgadottir 
declaration omitted Exhibit G, which is references Falk and Rubinstein and Terwilliger and 
Ott. These references are submitted herewith as Appendix B. 



In view of the foregoing amendment and remarks, Applicants believe pending 
claims 61-66 are in condition for allowance and early notice thereof is solicited. 



The Patent Office alleges that "as a general rule of thumb" the field looks for a 



The Patent Office observed that the Helgadottir declaration inadvertently used 



CONCLUSION 



Dated: October 16, 2007 




-^Registration N*k: 48^84 
MARSHALL, <jERSTEIN & BORUN LLP 
233 S. Wacker Drive, Suite 6300 
Sears Tower 

Chicago, Illinois 60606-6357 
(312) 474-6300 
Attorney for Applicant 



7 



Jv-.*",!«i-i'.>.- V'-'---:' ; -ir- -.'i.vt.. .; : .„\. .■'>.-;.,;',' ■.•.;....: ';•..•*':■, '^.>:-:j;.,,,-j.,>i-.: 

'. '.'V '.'"K .• 



nature . 

genetics 



I 
f 

3 
I 

o 
u 

si 



The gene encoding 5-lipoxygenase activating protein 
confers risk of myocardial infarction and stroke 

Anna Helgadottir 1 , Andrei Manolescu 1 , Gudmar Thorleifsson 1 , Solveig Gretarsdottir 1 , Helga Jonsdottir 1 , 
Unnur Thorsteinsdottir 1 , Nilesh J Samani 2 , Gudmundur Gudmundsson 1 , Struan F A Grant 1 , 
Gudmundur Thorgeirsson 3 , Sigurlaug Syeinbjornsdottir 3 , Einar M Valdimarsson 3 , Stefan E Matthiasson 3 , 
Halldor Johannsson 3 , Olof Gudmundsdottir 1 , Mark E Gurney 1 , Jesus Sainz 1 , Margret Thorhallsdottir 1 , 
Margret Andresdottir 1 , Michael L Frigge 1 , Eric J Topol 4 , Augustine Kong 1 , Vilmundur Gudnason 5 , 
Hakon Hakonarson 1 , Jeffrey R Gulcher 1 & Kari Stefansson 1 



3 
O 



3 
0. 



We mapped a gene predisposing to myocardial infarction to a locus on chromosome 1 3q1 2-1 3. A four-marker single-nucleotide 
polymorphism (SNP) haplotype in this locus spanning the gene ALOX5AP encoding 5-lipoxygenase activating protein (FLAP) is 
associated with a two times greater risk of myocardial infarction in Iceland. This haplotype also confers almost two times greater 
risk of stroke. Another ALOX5AP haplotype is associated with myocardial infarction in individuals from the UK. Stimulated 
neutrophils from individuals with myocardial infarction produce more leukotriene B4, a key product in the 5-lipoxygenase 
pathway, than do neutrophils from controls, and this difference is largely attributed to cells from males who carry the at-risk 
haplotype. We conclude that variants of ALOX5AP are involved in the pathogenesis of both myocardial infarction and stroke by 
increasing leukotriene production and inflammation in the arterial wall. 



CO 

Z 



o 

CM 



Cardiovascular diseases (CVD) are the leading causes of death and dis- 
ability in the developed world 1 , with an increasing prevalence due to 
the aging of the population and the obesity epidemic. More than 
1 million deaths in the US alone were caused by myocardial infarction 
and stroke in 2003 (ref. 2). Some of the processes underlying myocar- 
dial infarction are now understood: it is generally attributed to athero- 
sclerosis with arterial wall inflammation that ultimately leads to 
plaque rupture, fissure or erosion 3,4 . This process is known to involve 
diapedesis of monocytes across the endothelial barrier; activation of 
neutrophils, macrophage cells and platelets; and release of a variety of 
cytokines and chemokines 5 ' 6 , but the genetic basis of the process has 
not yet been deciphered. 

Two different approaches have been used to search for genes associ- 
ated with myocardial infarction. SNPs in candidate genes have been 
tested for association and have, in general, not been replicated or con- 
fer only a modest risk of myocardial infarction. Case-control associa- 
tion studies have identified several proinflammatory genes with 
variants that are associated with either an increased risk of myocardial 
infarction or a protective effect 7 " 9 . Four genome- wide scans in families 
with myocardial infarction have yielded several loci with formidable 
linkage peaks, but the gene(s) underlying these loci have not yet been 
identified 10 " 14 . In addition, one large pedigree study identified a dele- 



tion mutation of a transcription factor gene, MEF2A, with autosomal 
dominant transmission 14 . This is an interesting cause of myocardial 
infarction, but the prevalence of this or other mutations in MEF2A 
outside this family remains to be determined. 

Here we report a genome-wide scan of 296 multiplex Icelandic 
families including .713 individuals with myocardial infarction. 
Through suggestive linkage to a locus on chromosome 13ql2-13, we 
identified the gene (ALOX5AP) encoding FLAP and found that a 
four-SNP haplotype in the gene confers a nearly two times greater 
risk of myocardial infarction and stroke. FLAP is a regulator 15 of a 
crucial pathway in the genesis of leukotriene inflammatory media- 
tors, which are implicated in atherosclerosis both in a mouse 
model 16 and in human studies 17,18 . Males had the strongest associa- 
tion to the at-risk haplotype, and male carriers of the at-risk haplo- 
type also had significantly greater production of leukotriene-B4 
(LTB4), supporting the idea that proinflammatory activity has a role 
in the pathogenesis of myocardial infarction. We confirmed the asso- 
ciation of ALOX5AP with myocardial infarction in an independent 
cohort of British individuals with another haplotype. These results 
indicate that ALOX5AP is the first specific gene isolated that confers 
substantial population-attributable risk (PAR) of the complex traits 
of both myocardial infarction and stroke. 



ideCODE genetics, Sturlugata 8, Reykjavik, Iceland, department of Cardiovascular Sciences, University of Leicester, Glenfield Hospital, Leicester, UK. National 
University Hospital, Reykjavik, Iceland. Cleveland Clinic Foundation, Cleveland, Ohio, USA. Icelandic Heart Association, Reykjavik, Iceland. Correspondence should 
be addressed to K.S. (kstefans@decode.is). 

Published online 8 February 2004; doi:10.1038/ngl311 



NATURE GENETICS VOLUME 36 | NUMBER 3 j MARCH 2004 



233 



ARTICLES®: 



5 



o 
u 

si 

3 



a 

I 

a 

O) 

c 

!E 
a 

2 

3 
Q. 

e 



CM 

© 




* CM 



b rii:i»!M<! imm umamia tinitm\ tin an m muu 

6-1 



s 

I 3 



* 3- 



26 27 28 29 30 31 32 Mb 3 



I fllil HI I U i i Ifi! i I J ill I U)i 



-20 -10 0 10 20 30 kb 40 



RESULTS 
Linkage analysis 

We carried out a genome-wide scan in search of myocardial infarction 
susceptibility genes using a framework set of 1,068 microsatellite 
markers. The initial linkage analysis included 713 individuals with 
myocardial infarction who fulfilled the World Health Organization 
(WHO) MONICA research criteria 19 and were clustered in 296 
extended families. We repeated the linkage analysis for individuals 
with early onset, for males and for females separately. A description of 
the number of affected individuals and families in each analysis is 
provided in Supplementary Table 1 online, and the corresponding 
allele-sharing lod scores are given in Supplementary Figure 1 online. 
None of these analyses yielded a locus of genome-wide significance. 
The most promising lod score (2.86) was observed on chromosome 
13q 12-13 for linkage with females with myocardial infarction at the 
peak marker D13S289 (Supplementary Fig. 1 online). This locus also 
had the most promising lod score (2.03) for individuals with early- 
onset myocardial infarction. After we increased the information on 
identity-by-descent sharing to over 90% by typing an additional 14 
microsatellite markers in a 30-cM region around D13S289> the lod 
score for the association in females dropped to 2.48 (P = 0.00036), 
and the lod score remained highest at D13S289 (Fig. la). In an inde- 
pendent linkage study of males with ischemic stroke or transient 
ischemic attack (T1A), we observed linkage to the same locus with a 
lod score of 1.51 at the same peak marker (Supplementary Fig. 2 
online), further suggesting that a cardiovascular susceptibility factor 
might reside at this locus. 

Microsatellite association study 

The 7.6-Mb region that corresponds to a drop of 1 in lod score in the 
female-myocardial infarction linkage analysis contains 40 known 
genes (Supplementary Table 2 online). To determine which gene in 



Figure 1 Schematic view of the chromosome 13 linkage region showing 
ALOX5AP. (a) The linkage scan for females with myocardial infarction and 
the one-lod drop region that includes AL0X5AP. (b) Microsatellite 
association for all individuals with myocardial infarction: single-marker 
association (black dots) and two-, three-, four- and five-marker haplotype 
association (black, blue, green and red horizontal lines, respectively). The 
blue and red arrows indicate the location of the most significant haplotype 
association across ALOX5AP in males and females, respectively, (c) 
AL0X5AP gene structure, with exons shown as colored cylinders, and the 
locations of all SNPs typed in the region. The green vertical lines indicate 
the position of the microsatellites (b) and SNPs (c) used in the analysis. 



this region was most likely to contribute to myocardial infarction, we 
typed 120 microsatellite markers in the region and carried out a case- 
control association study using 802 unrelated (separated by at least 
three meioses) individuals with myocardial infarction and 837 popu- 
lation-based controls. We also repeated the association study for each 
of the three phenotypes that were used in the linkage study: individu- 
als with early onset, males and females with myocardial infarction. In 
addition to testing each marker individually, we also tested haplo- 
types based on these markers for association. To limit the number of 
haplotypes tested, we considered only haplotypes spanning less than 
300 kb that were over- represented among the affected individuals. 

The haplotype with the strongest association to myocardial infarc- 
tion (P = 0.00004) covered a region that contains two known genes: 
ALOX5AP (Fig. lb) and a gene with an unknown function called 
highly charged protein (D13S106E). The haplotype association in this 
region for females with myocardial infarction was less significant (P = 
0.0004) than for all individuals with myocardial infarction, and the 
most significant haplotype association was observed for males with 
myocardial infarction (P = 0.000002). The haplotype associated with 
males with myocardial infarction was the only haplotype that retained 
significant association after adjusting for all haplotypes tested. 

FLAP, together with .5 -lipoxygenase (5-LO), is a regulator of the 
leukotriene biosynthetic pathway that has recently been implicated in 
the pathogenesis of atherosclerosis 16 " 18 . Therefore, ALOX5AP was a 
good candidate for the gene underlying the association with myocar- 
dial infarction. 

Screening for SNPs in ALOX5AP and LD mapping 

To determine whether variations in ALOX5AP significantly associate 
with myocardial infarction and to search for causal variations, we 
sequenced ALOX5AP in 93 affected individuals and 93 controls. The 
sequenced region covers 60 kb containing ALOX5AP, including the 
five known exons and introns, the 26-kb region 5' to the first exon and 
the 7-kb region 3' to the fifth exon. We identified 144 SNPs, of which 
we excluded 96 from further analysis owing to either a low minor allele 
frequency or complete correlation (redundancy) with other SNPs. 
Figure lc shows the distribution of the 48 SNPs chosen for genotyp- 
ing, relative to exons, introns and the 5' and 3' flanking regions of 
ALOX5AP. We identified only one SNP in a coding sequence (exon 2), 
which did not lead to an amino acid substitution. The locations of the 
48 SNPs in the National Center for Biotechnology Information human 
genome assembly build 34 are listed in Supplementary Table 3 online. 
In addition to the SNPs, we typed a polymorphism consisting of a 
monopolymer A repeat in the ALOX5AP promoter region 20 . 

The linkage disequilibrium (LD) block structure defined by the 48 
genotyped SNPs is shown in Figure 2. Strong LD was detected across 
the ALOX5AP region, although at least one historical recombination 
seems to have occurred, dividing the region into two strongly corre- 
lated LD blocks. 



234 



VOLUME 36 | NUMBER 3 | MARCH 2004 NATURE GENETICS 



Exon 5 




D 1 />< 


value 




pe' 20 


0.911 


he' 18 


°' a fSr 


he' 16 




h<f u 


0.6iile- 12 


0.5 


he" 10 


0.4® ' 


1e" B 


0.3^ 




0.2 ■ 


0.0001 


o.rC 


0.01 




1 



Figure 2 Pairwise LD between SNPs in a 60-kb region encompassing 
AL0X5AP, The markers are plotted equidistantly. Two measures of LD are 
shown: D' in the upper left triangle and Pvalues in the lower right triangle. 
Colored lines indicate the positions of the exons of ALOX5AP, and the green 
stars indicate the location of the markers of the at-risk haplotype HapA. 
Scales for both measures of the LO strength are provided on trie right. 



Haplotype association with myocardial infarction 

In a case-control association study, we genotyped the 48 selected SNPs 
and the monopolymer A repeat marker in a set of 779 unrelated indi- 
viduals with myocardial infarction and 624 population-based con- 
trols. We tested each of the 49 markers individually for association 
with the disease. Three SNPs, one located 3 kb upstream of the first 
exon and the other two 1 kb and 3 kb downstream of the first exon, 
showed nominally significant association to myocardial infarction 
(Supplementary Table 4 online). After adjusting for the number of 
markers tested* however, these results were not significant. We then 
searched for haplotypes associated with the disease using the same 
cohorts. We limited the search to haplotype combinations constructed 
from two, three or four SNPs and tested only haplotypes that were 
over-represented in the individuals with myocardial infarction. The 
resulting Pvalues were adjusted for all the haplotypes we tested by ran- 
domizing the affected individuals and controls. 

Several haplotypes were significantly associated with the disease at 
an adjusted significance level of P < 0.05 (Supplementary Table 5 
online). We observed the most significant 
association with a four-SNP haplotype span- 
ning 33 kb, including the first four exons of 
ALOX5AP (Fig. lc), with a nominal P value of 
0.0000023 and an adjusted P value of 0.005. 
This haplotype, called HapA, has a haplotype 
frequency of 15.8% (carrier frequency 29.1%) 
in affected individuals versus 9.5% (carrier 
frequency 18.1%) in controls (Table 1). The 
relative risk conferred by HapA compared 
with other haplotypes constructed from the 
same SNPs, assuming a multiplicative model, 
was 1.8 and the corresponding PAR was 
13.5%. HapA was present at a higher fre- 
quency in males (carrier frequency 30.9%) 
than in females with myocardial infarction 
(carrier frequency 25.7%; Table 1 ). All other 
haplotypes that were significantly associated 
with an adjusted P value less than 0.05 were 



highly correlated with HapA and should be considered variants of that 
haplotype (Supplementary Table 5 online). 

Association of HapA with stroke and PAOD 

Because of the high degree of comorbidity among myocardial infarc- 
tion, stroke and peripheral arterial occlusive disease (PAOD), with 
most of these cases occurring on the basis of an atherosclerotic disease, 
we wanted to determine whether HapA was also associated with stroke 
or PAOD. We typed the SNPs defining HapA for these cohorts. We 
removed first- and second- degree relatives and all known cases of 
myocardial infarction and tested for association in 702 individuals 
with stroke and 577 individuals with PAOD (Table 1). We observed a 
significant association of HapA with stroke, with a relative risk of 1.67 
(P = 0.000095). In addition, we determined whether HapA was pri- 
marily associated with a particular subphenotype of stroke and found 
that both ischemic and hemorrhagic stroke were significantly associ- 
ated with HapA (Supplementary Table 6 online). Finally, although 
HapA was more frequent in the PAOD cohort than in the population 
controls (Table 1), this was not significant. Similar to the stronger 
association of HapA with males with myocardial infarction than with 
females with myocardial infarction, HapA also showed stronger asso- 
ciation with males than with females with stroke and PAOD (Table 1). 

Haplotype association in a British cohort 

In an independent study, we determined whether variants in 
ALOX5AP also affected the risk of myocardial infarction in a popula- 
tion outside Iceland. We typed SNPs defining HapA in a cohort of 753 
individuals from the UK who had sporadic myocardial infarction and 
in 730 British population controls. The affected individuals and con- 
trols were from three separate study cohorts recruited in Leicester and 
Sheffield. We found a slightly higher frequency of HapA in affected 
individuals versus controls (16.8% versus 15.1%, respectively), but the 
results were not statistically significant. As in the Icelandic population, 
HapA was more common in males with myocardial infarction (carrier 
frequency 31.7%) than in females with myocardial infarction (carrier 
frequency 28.0%). When we typed an additional nine SNPs, distrib- 
uted across ALOX5AP y in the British cohort and searched for other 
haplotypes that might be associated with myocardial infarction, two 
SNPs showed association to myocardial infarction with a nominally 
significant P value (data not shown). Moreover, three- and four-SNP 
haplotype combinations were associated with higher risk of myocar- 
dial infarction in the British cohort, and we observed the most signifi- 



Table 1 Association of HapA with myocardial infarction, stroke and PAOD 


Phenorype (n) 


Frequency 


RR 


PAR 


P value 


P value 3 


Myocardial infarction (779) 


0.158 


1.80 


0.135 


0.0000023 


0.005. 


Males (486) 


0.169 


1.95 


0.158 


0.00000091 


ND 


Females (293) 


0.138 


1.53 


0.094 


0.0098 


NO 


Early onset (358) 


0.139 


1.53 


0.094 


0.0058 


ND 


Stroke (702) b 


0.149 


1.67 


0.116 


0.000095 


ND 


Males (373) 


0.156 


1.76 


0,131 


0.00018 


ND 


Females (329) 


0.141 


1.55 


0.098 


0.0074 


ND 


PAOD (577) b 


0.122 


1.31. 


0.056 


0.061 


ND 


Males (356) 


0.126 


1.36 


0.065 


0.057 


ND. 


Females (221) 


0.114 


1.22 


0.041 


0.31 


ND 



a P value adjusted for the number of haplotypes tested. Excluding known cases of myocardial infarction. 
Shown is HapA of AL0X5AP and the corresponding number of affected individuals in), the haplotype frequency in 
affected individuals, the relative risk (RR), PAR and Pvalues. HapA is defined by the SNPs SG13S25, SG13S114, 
SG13S89 and SG13S32 (Supplementary Table 5 online). The same controls (n » 624) were used for the association 
analysis in myocardial infarction, stroke and PAOD as well as for the analysis of males, females and individuals with 
early onset. The frequency of HapA in the control cohort is 0.095. ND, not done. 



Table 2 Association of HapB with myocardial infarction in British individuals 



Phenotype in) 


Frequency 


RR 


..PAR 


P value 


P value 8 


Myocardial infarction (753) 


0.075 


1.95 


0.072 


0.00037 


0.046 


Males (549) 


0.075 


1.97 


0.072 


0.00093 


NO 


Females (204) 


0.073 


1.90 


0.068 


0.021 


ND 



a P value adjusted for the number of haplotypes tested using 1,000 randomization tests. 

Shown are the results for HapB that shows the strongest association in the British myocardial infarction cohort. HapB 
is defined by the SNPs SG13S377, SG13S1 14, SG13S41 and SG13S35, which have the alleles A, A, A and G, 
respectively. In all three phenotypes shown, the same set of 730 British controls was used and the frequency of HapB 
in the control cohort is 0.040. Number of affected individuals in), haplotype frequency in affected individuals, 
relative risk (RR) and PAR are indicated. ND, not done. 



cant association for a four-SNP haplotype with a nominal P value of 
0.00037 (Table 2). We call this haplotype HapB. The haplotype fre- 
quency of HapB was 7.5% in the individuals with myocardial infarc- 
tion (carrier frequency 14.4%) compared with 4.0% (carrier 
frequency 7.8%) in controls, conferring a relative risk of 1.95 (Table 
2). This association of HapB remained significant after adjusting for 
all haplotypes tested, using 1,000 randomization steps, with an 
adjusted P = 0.046. No other SNP haplotype had an adjusted P value 
<0.05. The two at-risk haplotypes, HapA and HapB, are mutually 
exclusive; there are no instances in which the same chromosome car- 
ries both haplotypes. 

More LTB4 in individuals with myocardial infarction 

To determine whether individuals with a past history of myocardial 
infarction had greater activity of the 5-LO pathway than controls, we 
measured production of LTB4 (a key product of the 5-LO pathway) 
in blood neutrophils isolated from Icelandic individuals with 
myocardial infarction and controls before and after stimulation with 
the calcium ionophore ionomycin. We detected no difference in 



0.016 




Ml (41) 
Control (35) 



Male Ml 

with HapA (10) 

CZ3 

Male Ml 

without HapA (18) 



Control (35) 



15 min 



30 min 



Figure 3 LTB4 production of ionomycin-stimulated neutrophils from, 
individuals with myocardial infarction {n = 41) and controls (n = 35). The log- 
transformed (mean ± s.d.) values measured at 15 and 30 min in stimulated 
cells are shown. (a).LTB4 production in individuals with myocardial infarction 
(Ml) and controls. The difference in the mean values between affected 
individuals and controls was tested using a two-sample Mest of the log- 
transformed values, (b) LTB4 production in males with myocardial infarction 
carrying HapA {red bars) and not carrying HapA (white bars). Mean values of 
controls (blue bars) are included for comparison. Males with HapA produced 
the highest amounts of LTB4 (P< 0.005 compared with controls). Data for 
females are shown in Supplementary Table 7 online. 



LTB4 production in resting neutrophils from 
individuals with myocardial infarction ver- 
sus controls. In contrast, LTB4 generation by 
neutrophils stimulated with ionomycin was 
substantially greater in individuals with 
myocardial infarction than in controls after 
15 and 30 min, respectively {Fig. 3a). 
Moreover, the observed difference in release 
of LTB4 was largely accounted for by male 
carriers of HapA (Fig. 3b), whose cells pro- 
duced significantly more LTB4 than cells 
from controls (P = 0.0042; Supplementary 
Table 7 online). There was also a heightened LTB4 response in males 
who did not carry HapA, but this difference was of borderline signif- 
icance (Supplementary Table 7 online). This could be explained by 
additional variants in ALOX5AP that have not been uncovered, or in 
other genes belonging to the 5-LO pathway, that may account for 
upregulauon of the LTB4 response in some individuals without the 
ALOX5AP at-risk haplotype. We did not detect differences in LTB4 
response in females (Supplementary Table 7 online), but because of 
the small sample size, this result is hot conclusive. The elevated levels 
of LTB4 production in stimulated neutrophils from male carriers of 
the at-risk haplotype suggest that the disease-associated variants of 
ALOX5AP heighten the response of FLAP to factors that stimulate 
inflammatory cells. 

DISCUSSION 

Our results show that variants of ALOX5AP encoding FLAP are asso- 
ciated with greater risk of myocardial infarction and stroke. In our 
Icelandic cohort, a haplotype that spans ALOX5AP is carried by 
29.1% of all individuals with myocardial infarction and almost dou- 
bles the risk of myocardial infarction. We then replicated these find- 
ings in an independent cohort of individuals with stroke. 
Furthermore, stimulated neutrophils from individuals with myocar- 
dial infarction had greater production of LTB4, one of the key prod- 
ucts of the 5-LO pathway. When we examined this in the context of 
the at-risk haplotype, however, the gain of function was largely 
attributed to male carriers of the at-risk haplotype, who also had the 
strongest association with the ALOX5AP haplotype. Another haplo- 
type spanning ALOX5AP was associated with myocardial infarction 
in a British cohort. Although the pathogenic variants responsible for 
the effects associated with the disease haplotypes are unknown, the 
greater production of LTB4 observed in ionomycin-stimulated neu- 
trophils from male carriers of the at-risk haplotype suggests that the 
disease-associated variants increase the response of FLAP to factors 
that stimulate inflammatory cells. 

We observed suggestive linkage to chromosome 13ql2— 13 with 
several different phenotypic groups, including females with myocar- 
dial infarction, individuals of both sexes with early-onset myocardial 
infarction and males with ischemic stroke or TIA. But we observed 
the strongest haplotype association for males with myocardial 
infarction or stroke. Therefore, the linkage signal in females with 
myocardial infarction and in individuals with early-onset myocardial 
infarction is not explained by the at-risk haplotype that we identi- 
fied, and we expect that there may be other unidentified variants or 
haplotypes in ALOX5AP, or in other genes in the linkage region, that 
may confer risk of these cardiovascular phenotypes. These variants 
are probably rarer than HapA with relatively high penetrance, higher 
in women than in men. 

FLAP has an important role in the initial steps of leukotriene 
biosynthesis 15 , which is largely confined to leukocytes and can be 



236 



VOLUME 36 | NUMBER 3 | MARCH 2004 NATURE GENETICS 



ARTICLES 



triggered by a variety of stimuli. In this biosynthetic pathway, unes- 
terified arachidonic acid is converted to LTA4 by the action of 5-LO 
and its activating protein FLAP. The unstable epoxide LTA4 is fur- 
ther metabolized to LTB4 or LTC4 by LTA4 hydrolase and LTC4 syn- 
thase, respectively. In addition, LTA4 can be exported to 
neighboring cells that are devoid of 5-LO activity and become sub- 
ject to transcellular leukotriene biosynthesis 21 " 23 . The leukotrienes 
have a variety of proinflammatory effects 24,25 . LTB4 activates leuko- 
cytes, leading to chemotaxis and increased adhesion of leukocytes to 
vascular endothelium, release of lysosomal enzymes such as 
myeloperoxidase and production of superoxide anions 25 . The cys- 
teinyl-containing leukotrienes (LTC4 and its metabolites LTD4 and 
LTE4) increase vascular permeability in postcapillary venules and 
are potent vasoconstrictors of coronary arteries 26 " 28 . 

The importance of the 5-LO pathway is well established in 
asthma, and drugs inhibiting this pathway have been developed for 
treating asthma. The role of the 5-LO pathway in the pathogenesis 
of atherosclerosis has recently received attention. A study of post- 
mortem pathologic specimens showed an increase in the expression 
of members of the 5-LO pathway, including 5-LO and FLAP, in ath- 
erosclerotic lesions at various stages of development in the aorta, 
coronary arteries and carotid arteries 18 . Furthermore, 5-LO was 
localized to macrophages, dendritic cells, foam cells, mast cells and 
neutrophilic granulocytes, and the number of cells expressing 5-LO 
was markedly greater in advanced lesions 18 . The leukocytes positive 
for 5-LO accumulated at distinct sites that are most prone to rup- 
ture 29 , such as the shoulder regions below the fibrous cap of the ath- 
erosclerotic lesion 18 . A 5-LO promoter variant is associated with 
abnormal carotid artery intima-media thickness and heightened 
inflammatory biomarkers 30 . In addition, antagonists of LTB4 block 
the development of atherosclerosis in apo-E-deficient and LDRL- 
deficient mice 31 , and a congenic mouse strain with a heterozygous 
deficiency of 5-LO shows resistance to atherosclerosis 16 , further 
supporting the idea that greater activity of the 5-LO pathway has a 
role in predisposition to atherosclerosis. 

Our data also show that the at-risk haplotype of ALOX5AP has 
higher frequency in all subgroups of stroke, including ischemic stroke, 
TIA and hemorrhagic stroke. HapA confers significandy higher risk of 
myocardial infarction and stroke than it does of PAOD. This could be 
explained by differences in the pathogenesis of these diseases. Unlike 
individuals with PAOD, who have ischemic legs because of atheroscle- 
rotic lesions that are responsible for gradually diminishing blood flow 
to the legs, individuals with myocardial infarction and stroke have suf- 
fered acute events, with disruption of the vessel wall suddenly decreas- 
ing blood flow to regions of the heart and the brain. 

We did not find association between HapA and myocardial infarc- 
tion in a British cohort, but we did find significant association between 
myocardial infarction and a different ALOX5AP variant. The existence 
of different haplotypes of the gene conferring risk to myocardial 
infarction in different populations is not unexpected. It is not unrea- 
sonable to assume that a common disease like myocardial infarction is 
associated with many different mutations or sequence variations and 
that the frequencies of these disease-associated variants may differ 
between populations. It would also not be unexpected for the same 
mutation to arise on different haplotypic backgrounds. 

Our work suggests that ALOX5AP has an important role in the 
pathogenesis of myocardial infarction and stroke in humans. Our 
study, together with others, may provide the necessary background to 
launch therapeutic trials to determine whether pharmacological inhi- 
bition of FLAP will prevent the development of myocardial infarction 
and stroke. 



METHODS 

Study population. We recruited the individuals in the study from a registry of 
over 8,000 individuals, which includes all individuals who had myocardial 
infarctions before the age of 75 in Iceland from 1981 to 2000. This registry is a 
part of the WHO MONICA Project' 9 . Diagnoses of all individuals in the reg- 
istry follow strict diagnostic rules based on signs, symptoms, electrocardio- 
grams, cardiac enzymes and necropsy findings. 

We used genotypes from 713 individuals with myocardial infarction and 
1 ,741 of their first-degree relatives in the linkage analysis. For the microsatellite 
association study of the locus associated with myocardial infarction, we used 
802 unrelated (no first- or second -degree relatives) individuals with myocardial 
infarction (233 females, 624 males and 302 with early onset) and 837 popula- 
tion-based controls. The females studied were post-menopausal. Over 90% of 
the individuals were taking aspirin or other nonsteroidal anti-inflammatory 
drugs. For the SNP association study in and around ALOX5AP> we genotyped 
779 unrelated individuals with myocardial infarction (293 females, 486 males 
and 358 with early onset). The control group for the SNP association study was 
population-based and comprised of 624 unrelated males and females 20-90 
years of age whose medical history was unknown. The stroke and PAOD 
cohorts used in this study have previously been described 32-34 . For the stroke 
linkage analysis, we used genotypes from 342 males with ischemic stroke or TIA 
that were linked to at least one other male within and including six meioses in 
164 families. For the association studies, we analyzed 702 individuals with all 
forms of stroke (329 females and 373 males) and 577 individuals with PAOD 
(221 females and 356 males). Individuals with stroke or PAOD who also had 
myocardial infarction were excluded. Controls used for the stroke and PAOD 
association studies were the same as used in the myocardial infarction SNP 
association study. 

The study was approved by the Data Protection Commission of Iceland and 
the National Bioethics Committee of Iceland. We obtained informed consent 
from all study participants. Personal identifiers associated with medical infor- 
mation and blood samples were encrypted with a third-party encryption sys- 
tem as previously described 35 . 

Statistical analysis. We carried out a genome-wide scan as previously 
described 33 , using a set of 1,068 microsatellite markers. We used multipoint, 
affected-only allele-sharing methods 36 to assess the evidence for linkage. All 
results were obtained using the program Allegro 37 and the deCODE genetic 
map 38 . We used the S pairs scoring function 39,40 and the exponential allele-shar- 
ing model 36 to generate the relevant 1 -degree-of-freedom statistics. When 
combining the family scores to obtain an overall score, we used a weighting 
scheme that is halfway on a log scale between weighting each affected pair 
equally and weighting each family equally. In the analysis, all genotyped indi- 
viduals who were not affected were treated as 'unknown*. Because of concern 
with small-sample behavior, we usually computed corresponding P values in 
two different ways for comparison and report the less significant one. The first 
P value was computed based on large sample theory, Zj r = V(2 log< ( 1 0) lod), and 
is distributed approximately as a standard normal distribution under the null 
hypothesis of no linkage 36 . A second P value was computed by comparing the 
observed lod score with its complete data sampling distribution under the null 
hypothesis 37 . When a data set consisted of more than a handful of families, 
these two P values tended to be very similar. The information measure we used, 
which is implemented in Allegro, is closely related to a classical measure of 
information and has a property that is between 0 (if the marker genotypes are 
completely uninformative) and 1 (if the genotypes determine the exact amount 
of allele sharing by descent among the affected relatives) 4 M2 . 

For single-marker association studies, we used Fisher's exact test to calculate 
two-sided P values for each allele. All P values are unadjusted for multiple com- 
parisons unless specifically indicated. We present allelic rather man carrier fre- 
quencies for microsatellites, SNPs and haplotypes. To minimize any bias due to 
the relatedness of the individuals who were recruited as families for the linkage 
analysis, we eliminated first- and second-degree relatives. For the haplotype 
analysis we used the program NEMO 32 , which handles missing genotypes and 
uncertainty with phase through a likelihood procedure, using the expectation- 
maximization algorithm as a computational tool to estimate haplotype fre- 
quencies. Under the null hypothesis, the affected individuals and controls were 
assumed to have identical haplotype frequencies. Under the alternative 



NATURE GENETICS VOLUME 36 j NUMBER 3 | MARCH 2004 



237 



hypotheses, the candidate at-risk haplotype was allowed to have a higher fre- 
quency in the affected individuals than in controls, and the ratios of frequencies 
of all other haplotypes were assumed to be the same in both groups. 
Likelihoods were maximized separately under both hypotheses, and a corre- 
sponding 1-degreerof- freedom likelihood ratio statistic was used to evaluate 
statistical significance 32 . Although we only searched for haplotypes that 
increased the risk, all reported P values are two-sided unless otherwise stated. 
To assess the significance of the haplotype association corrected for multiple 
testing, we carried out a randomization test using the same genotype data. We 
randomized the cohorts of affected individuals and controls and repeated the 
analysis. This procedure was repeated up to 1 ,000 times, and the P value we pre- 
sent is the fraction of replications that produced a P value for a haplotype tested 
that was lower than or equal to the P value we observed using the original 
affected individual and control cohorts. 

For both single-marker and haplotype analysis, we calculated relative risk 
(RR) and PAR assuming a multiplicative model 43 ' 44 in which the risk of the two 
alleles of haplotypes a person carries multiply. We calculated LD between pairs 
of SNPs using the standard definition of D* (ref. 45) and R 2 (ref. 46). Using 
NEMO, we estimated frequencies of the two marker allele combinations by 
maximum likelihood and evaluated deviation from linkage equilibrium by a 
likelihood ratio test. When plotting all SNP combinations to elucidate the LD 
structure in a particular region, we plotted U in the upper left corner and the P 
value in the lower right corner. In the LD plots we present, the markers are plot- 
ted equidistantly rather than according to their physical positions. 

Identification of DNA polymorphisms. We identified new polymorphic 
repeats (dinucleotide or trinucleotide repeats) with the Sputnik program. We 
subtracted the lower allele of the CEPH sample 1347-02 (CEPH genomics 
repository) from the alleles of the microsatellites and used it as a reference. We 
detected SNPs in the gene by PCR sequencing exonic and intronic regions from 
affected individuals and controls. We also detected public polymorphisms by 
BLAST search of the National Center for Biotechnology Information SNP data- 
base. We genotyped SNPs using a method for detecting SNPs with fluorescent 
polarization template-directed dye- terminator incorporation 47 and TaqMan 
assays (Applied Biosystems). 

Isolation and activation of peripheral blood neutrophils. We drew 50 ml of 
blood from each of 41 individuals with myocardial infarction and 35 age- and 
sex-matched controls into vacutainers containing EDTA. All blood was drawn 
at the same time in the early morning after 12 h of fasting. We isolated neu- 
trophils using Ficoll-Paque PLUS (Amersham Biosciences). . 

We collected the red cell pellets from the Ficoll gradient and then lysed red 
blood cells in 0.165 M ammonium chloride for 10 min on ice. After washing 
them with phosphate-buffered saline, we counted neutrophils and plated them 
at 2 x 10 6 cells ml- 1 in 4-ml cultures of 15% fetal calf serum (GIBCO BRL) in 
RPM1-1640 medium (GIBCO BRL). We then stimulated cells with maximum 
effective concentration of ionomycin ( 1 u,M). At 0, 1 5, 30, 60 min after adding 
ionomycin, we aspirated 600 ul of culture medium and stored it at -80 °C for 
the measurement of LTB4 release as described below. We maintained cells at 
37 °C in a humidified atmosphere of 5% carbon dioxide-95% air. We treated all 
samples with indomethasine ( 1 uM) to block the cyclooxygenase enzyme. 

Ionomycin-induced release of LTB4 in neutrophils. We used the LTB4 
Immunoassay (R&D systems) to quantify LTB4 concentration in supernatant 
from cultured ionomycin-stimulated neutrophils. The assay we used is based on 
the competitive binding technique in which LTB4 present in the testing samples 
(200 uJ) competes with a fixed amount of alkaline phosphatase-labeled LTB4 for 
sites on a rabbit polyclonal antibody. During the incubation, the polyclonal anti- 
body becomes bound to a goat antibody to rabbit coated onto the microplates. 
After washing to remove excess conjugate and unbound sample, a substrate solu- 
tion was added to the wells to determine the bound enzyme activity. We stopped 
the color development and read the absorbance at 405 nm. The intensity of the 
color is inversely proportional to the concentration of LTB4 in the sample. Each 
LTB4 measurement using the LTB4 1 mmunoassay was done in duplicate. 

British study population. We recruited three separate British cohorts as 
described previously 48,49 . The first two cohorts comprised 549 individuals from 



among those who were admitted to the coronary care units of the Leicester 
Royal Infirmary, Leicester (July 1993-April 1994), and the Royal Hallamshire 
Hospital, Sheffield (November 1995-March 1997), and satisfied the WHO cri- 
teria for acute myocardial infarction in terms of symptoms, elevations in car- 
diac enzymes or electrocardiographic changes 50 . We recruited 532 control 
individuals in each hospital from adult visitors of individuals with noncardio- 
vascular disease on general medical, surgical, orthopedic and obstetric wards to 
find subjects representative of the source population from which the affected 
individuals originated. Individuals who reported a history of coronary heart 
disease were excluded. 

In the third cohort, we recruited 204 individuals retrospectively from the 
registries of three coronary care units in Leicester. All had suffered a myocardial 
infarction according to WHO criteria before the age of 50 years. At the time of 
participation, individuals were at least 3 months from the acute event The con- 
trol cohort comprised 198 individuals with no personal or family history of 
premature coronary heart disease, matched for age, sex and current smoking 
status with the cases. We recruited control individuals from three primary care 
practices located in the same geographical area. In all cohorts, individuals were 
white of Northern European origin. Local research ethics committees approved 
all the studies, and individuals provided written informed consent for use of 
samples in genetic studies of coronary artery disease. 

URLs. The Sputnik program is available at http://espressosoftware.com/pages/ 
sputnik.jsp. The National Center for Biotechnology Information SNP database 
is available at http://www.ncbi.nlm.nih.gov/SNP/index.html. 

Note: Supplementary information is available on the Nature Genetics website. 

ACKNOWLEDGMENTS 

We thank the affected individuals and their families whose contribution made 
this study possible and the nurses at the Icelandic Heart Association, personnel 
at the deCODE core facilities, T. Jonsdottir, F. Runarsson, E. Palsdottir, J. Kostic, 
K. Channer, R. Steeds, R. Singh and P. Braund for their contributions. N.J.S. is 
supported by the British Heart Foundation. 

COMPETING INTERESTS STATEMENT 

The authors declare competing financial interests (see the Nature Genetics website 
for details). 

Received 7 January; accepted 26 January 2004 
Published online at http://www.nature.com/naturegenetics/ 



1. Bonow, R.O.. Smaha, LA., Smith, S.C. Jr., Mensah, G.A. & Lenfant, C. World Heart 
Day 2002: the international burden of cardiovascular disease: responding to the 
emerging global epidemic. Circulation 108, 1602-1605 (2002). 

2. Heart Disease and Stroke Statistics, 2003 Update (American Heart Association, 
Dallas, Texas, 2002). 

3. Lusis, A.J. Atherosclerosis. Nature AVI, 233-241 (2000). 

4. Libby. P. Inflammation in atherosclerosis. Nature 420, 868-874 (2002). 

5. Stratford, N., Britten, K. & Gallagher, P. Inflammatory infiltrates in human coronary 
atherosclerosis. Atherosclerosis 59, 271-276(1986). 

6. Poole, J.C. & Florey, H.W. Changes in the endothelium of the aorta and the behaviour 
of macrophages in experimental atheroma of rabbits. J. Pathol. Bacteriol. 75, 
245-251 (1958). 

7. Topol, EJ. et a\. Single nucleotide polymorphisms in multiple novel thrombospondin 
genes may be associated with familial premature myocardial infarction. Circulation 
104, 2641-2644 (2001). 

8. Ozaki, K. et al. Functional SNPs in the lymphotoxin-alpha gene that are associated 
with susceptibility to myocardial infarction. Nat Genet. 32, 650-654 (2002). 

9. Yamada, Y. et al. Prediction of the risk of myocardial infarction from polymorphisms 
in candidate genes. N. Engl. J. Med. 347, 1916-1923 (2002). 

10. Broeckel, U. et al. A comprehensive linkage analysis for myocardial infarction and its 
related risk factors. Nat. Genet. 30, 210-214 (2002). 

11. Francke, S. et al. A genome-wide scan for coronary heart disease suggests in Indo- 
Mauritians a susceptibility locus on chromosome 16pl3 and replicates linkage with 
the metabolic syndrome on 3q27. Hum. Mot. Genet. 10, 2751-2765 (2001). 

12. Harrap, S.B. et al. Genome-wide linkage analysis of the acute coronary syndrome sug- 
gests a locus on chromosome 2. Arterioscler. Thromb. Vase. Biol. 22, 874-878 (2002). 

13. Pajukanta, P. et al. Two loci on chromosomes 2 and X for premature coronary heart 
disease identified in early- and late-settlement populations of Finland. Am. J. Hum. 
Genef. 67, 1481-1493(2000). 

14. Wang, L, Fan, C, Topol, S.E., Topol, E.J. & Wang, Q. Mutation of MEF2A in an inher- 
ited disorder with features of coronary artery disease. Science 302, 1578-1581 
(2003). 



ARTICLES 



(0 

o 

s 

? 
I 



O 

o 
£ 

3 
C 



Q. 
3 

s 

o 

C 



3 
Qu 

2 

3 

to 



15. Dixon, R.A. etal. Requirement of a 5-lipoxygenase-activating protein for leukotriene 
synthesis. Nature 343, 282-284 (1990). 

16. Mehrabian, M. et af. Identification of 5-lipoxygenase as a major gene contributing to 
atherosclerosis susceptibility in mice. Circ. Res. 91 , 120-126 (2002). 

17. Brezinski, O.A., Nesto, R.W. & Serhan, C.N. Angioplasty triggers intracoronary 
leukotrienes and lipoxin A4. Impact of aspirin therapy. Circulation 86, 56-63(1992). 

18. Spanbroek, R. et al. Expanding expression of the 5-lipoxygenase pathway within the 
arterial wall during human atherogenesis. Proc. Natl. Acad. Sci. USA 100, 
1238-1243(2003). 

19. The World Health Organization MONICA Project (monitoring trends and determinants 
in cardiovascular disease): a major international collaboration. WHO MONICA Project 
Principal Investigators. J. Clin. Epidemiol. 41, 105-14 (1988). 

20. Koshino, T. etal. Novel polymorphism of the 5-lipoxygenase activating protein (FLAP) 
promoter gene associated with asthma. Moi Cell. Biol. Res. Commun. 2, 32-35 
(1999). 

21,Sala, A., Bolla, M., Zarini, S. r Muller-Peddinghaus, R. & Folco, G. Release of 
leukotriene A4 versus leukotriene 64 from human polymorphonuclear leukocytes. J. 
Biol. Chem. 271 , 1 7944-1 7948 (1996). 

22. Dahinden, C.A., Clancy, R.M:, Gross, M., Chiller, J.M. & Hugh, T.E. Leukotriene C4 
production by murine mast cells: evidence of a role for extracellular leukotriene A4. 
Proc. Natl. Acad. Sci. USA 82, 6632-6636 ( 1 985). 

23. Fiore, S. & Serhan, C.N. Formation of lipoxinsand leukotrienes during receptor- medi- 
ated interactions of human platelets and recombinant human 
granulocyte/macrophage colony-stimulating factor-primed neutrophils. J. Exp. Med. 
172,1451-1457(1990). 

24. Ford-Hutchinson, A.W. Leukotriene B4 in inflammation. Crit. Rev. Immunol. 10, 
1-12(1990). 

25. Samuelsson, B. Leukotrienes: mediators of immediate hypersensitivity reactions and 
inflammation. Science 220, 568-575(1983). 

26. Burke, J.A., Levi, R. ( Guo, Z.G. & Corey! E.J. Leukotrienes C4, D4 and E4: effects on 
human and guinea-pig cardiac preparations in vitro. J. Pharmacol. Exp. Ther. 221, 
235-241 (1982). 

27. Roth, D.M. & Lefer, A.M. Studies on the mechanism of leukotriene induced coronary 
artery constriction. Prostaglandins 26, 573-581 (1983). 

28. Wargovich, X, Mehta, J., Nichols, WW., Pepine, C.J. & Conti, C.R. Reduction in 
blood flow in normal and narrowed coronary arteries of dogs by leukotriene C4. J. Am. 
Coll. Cardiol. 6, 1047-1051 (1985). 

29. Fa!k, E., Shah, P.K. & Fuster, V. Coronary plaque disruption. Circulation 92, 657-671 
(1995). 

30. Dwyer, J.H. et al. Arachidonate 5-Lipoxygenase Promoter Genotype, Dietary 
Arachidonic Acid, and Atherosclerosis. N. Engl. J. Med. 350, 29-37 (2004). 

31. Aiello, R.J. etal. Leukotriene B4 receptor antagonism reduces monocytic foam cells 
in mice. Arterioscier. Thromb. Vase. Biol. 22, 443-449 (2002). 

32. Gretarsdottir, S. et al. The gene encoding phosphodiesterase 4D confers risk of 



ischemic stroke. Nat. Genet. 35, 131-138 (2003). 

33. Gretarsdottir, S. et al. Localization of a susceptibility gene for common forms of 
stroke to 5ql2. Am. J. Hum. Genet. 70, 593-603 (2002). 

34. Gudmundsson, G. etal. Localization of a gene for peripheral arterial occlusive disease 
to chromosome lp31. Am. J. Hum. Genet. 70, 586-592 (2002). 

35. Gulcher, J.R., Kristjansson, K., Gudbjartsson, H. & Stefansson, K. Protection of pri- 
vacy by third-party encryption in genetic research in Iceland. Eur. J. Hum. Genet. 8, 
739-742 (2000). 

36. Kong, A. & Cox, NJ. Allele-sharing models: LOD scores and accurate linkage tests. 
Am. J. Hum. Genet. 61, 1179-1188(1997). 

37. Gudbjartsson, D.F., Jonasson, K., Frigge, M.L. & Kong, A. Allegro, a new computer 
program for multipoint linkage analysis. Nat. Genet. 25, 12-13 (2000). 

38. Kong, A. et al. A high-resolution recombination map of the human genome. Nat. 
Genet. 31,241-247(2002). 

39. Whittemore, A.S. & Halpern, J. A class of tests for linkage using affected pedigree 
members. Biometrics SO, 118-127 (1994). 

40. Krugiyak, L, Daly, M.J., Reeve-Daly, M.P. & Lander, E.S. Parametric and nonpara- 
metric linkage analysis: a unified multipoint approach. Am. J. Hum. Genet. 58, 
1347-1363(1996). 

41. Nicoiae, D. Allele Sharing Models in Gene Mapping: A Likelihood Approach 
(University of Chicago, 1999). 

42. Dempster, A., Laird, N.M. & Rubin, D.B. Maximum likelihood from incomplete data 
via the EM algorithm. J. R. Stat. Soc. 039, 1-38 (1977). 

43. Terwi Niger, J.D. & Ott, J. A haplotype-based 'haplotype relative risk' approach to 
detecting allelic associations. Hum. Hered.A2, 337-346(1992). 

44. Falk, C.T. & Rubinstein, P. Haplotype relative risks: an easy reliable way to construct a 
proper control sample for risk calculations. Ann. Hum. Genet. 51 Pt 3, 227-233 
(1987). 

45. Lewontin, R.C. The interaction of selection and linkage, ii. Optimum models. 
GeneticsSQ, 757-782 (1964). 

46. Hill, W.G. & Robertson, A. The effects of inbreeding at loci with heterozygote advan- 
tage. Genetics 60, 615-628(1968). 

47. Chen, X., Zehnbauer, B., Gnirke, A. & Kwok, P.Y. Fluorescence energy transfer detec- 
tion as a homogeneous DNA diagnostic method. Proc. Natl. Acad. Sci. USA 94, 
10756-10761 (1997). 

48. Steeds, R., Adams, M., Smith, P., Channer, K. & Samani, NJ. Distribution of tissue 
plasminogen activator insertion/deletion polymorphism in myocardial infarction and 
control subjects. Thromb. Haemost. 79, 980-984 (1998). 

49. Brouilette, S., Singh, R.K., Thompson; J.R., Goodall, A.H. & Samani. NJ. White cell 
telomere length and risk of premature myocardial infarction. Arterioscier. Thromb. 
Vase. Biol. 23, 842-346 (2003). 

50. Nomenclature and criteria for diagnosis of ischemic heart disease. Report of the Joint 
International Society and Federation of Cardiology World Health Organization task 
force on standardization of clinical nomenclature. Circulation 59, 607-609 (1979). 



S 

© 



NATURE GENETICS VOLUME 36 | NUMBER 3 I MARCH 2004 



239 



Am. ]. Hum. Genet. 76:000-000, 20Q5 

Report . , 



Association between the Gene Encoding 5-Lipoxygenase-Activating Protein 
and Stroke Replicated in a Scottish Population 

A. Helgadottir, 1 S. Greta rsdottir, 1 D. St. Clair, 2 A. Manoiescu, 1 J. Cheung, 2 C. Thorleifsson, 1 
A. Pasdar, 2 S. F. A. Grant, 1 L J. Whalley, 2 H. Hakonarson, 1 U. Thorsteinsdottir, 1 A. Kong, 1 
J. Gulcher 1 K. Stefansson, 1 and M. J. MacLeod 2 

'deCODE Genetics, Reykjavik; and 2 Aberdeen Royal Infirmary and University of Aberdeen Medical School, Aberdeen, Scotland 

Cardiovascular diseases, including myocardial infarction (MI) and stroke, most often occur on the background of 
atherosclerosis, a condition attributed to the interactions between multiple genetic and environmental risk factors. 
We recently reported a linkage and association study of MI and stroke that yielded a genetic variant, HapA, in 
the gene encoding 5-lipoxygenase-activating protein (ALOX5AP), that associates with both diseases in Iceland. We 
also described another ALOXSAP variant, HapB, that associates with MI in England. To further assess the con- 
tribution of the ALOXSAP variants to cardiovascular diseases in a population outside Iceland, we genotyped seven 
single-nucleotide polymorphisms that define both HapA and HapB from 450 patients with ischemic stroke and 
710 controls from Aberdeenshire, Scotland. The Icelandic at-risk haplotype, HapA, had significantly greater fre- 
quency in Scottish patients than in controls. The carrier frequency in patients and controls was 33.4% and 26.4%, 
respectively, which resulted in a relative risk of 1.36, under the assumption of a multiplicative model (P = .007). 
We did not detect association between HapB and ischemic stroke in the Scottish cohort. However, we observed 
that HapB was overrepreserited in male patients. This replication of haplotype association with stroke in a population 
outside Iceland further supports a role for ALOXSAP in cardiovascular diseases. 



Cardiovascular diseases (CVDs), such as coronary heart 
disease and stroke, are major causes of death and dis- 
ability in western societies (Aboderin et al. 2002). As a 
result of the increasing age of the population, the preva- 
lence of CVD is rising worldwide (American Heart As- 
sociation 2002). CVDs are largely attributed to athero- 
sclerosis, which has various environmental and genetic 
risk factors. It is a commonly held view that chronic in- 
flammation initiates and promotes the development of 
atherosclerotic lesions (Lusis 2000; Libby 2002). Large 
epidemiologic studies have demonstrated correlations be- 
tween increased production of markers of systemic in- 
flammation and future cardiovascular events, including 
myocardial infarction (MI) (Ridker et al. 1997, 1998; 



Received September 28, 2004; accepted for publication December 
13, 2004; electronically published January 7, 2005. 

Address for correspondence and reprints: Dr. K. Stefansson, deCODE 
Genetics Inc., Sturlugata 8, 101 Reykjavik, Iceland. E-mail: kstefans 
@decode.is 

© 2005 by The American Society of Human Genetics. All rights reserved. 
0002-9297/2005/7603-OOXXS15.00 



Danesh et al. 2000) and stroke (Di Napoli et al. 2001), 
which supports a central role for inflammation in CVD. 

We recently published the association of a variant in 
the gene encoding 5-lipoxygenase-activating protein 
(ALOXSAP [MIM 603700]) with both MI and stroke 
in an Icelandic population (Helgadottir et al. 2004). 
ALOXSAP, which encodes an important component of 
the leukotriene pathway, was identified through a ge- 
npmewide linkage scan conducted on 296 families with 
MI and subsequent analysis that determined association 
with markers within the mapped region on chromosome 
13ql2-13. A haplotype spanning ALOXSAP, HapA, de- 
fined by four SNPs, was shown to be associated with MI 
(relative risk = 1.8; P = .0000023) and, subsequently, 
the same variant was found to confer risk of stroke in 
Iceland (relative risk [RR] = 1.7; P = .000095) (Helga- 
dottir et al. 2004). Another SNP-based haplotype within 
ALOXSAP, HapB, showed significant association with 
MI in British cohorts from Leicester and Sheffield 
(RR = 2.0; P = .00037) (Helgadottir et al. 2004). We 
further demonstrated that leukotriene B4 (LTB4) syn- 
thesis by neutrophils from patients with a history of MI 



000 



000 



Am. ]. Hum. Genet. 76:000-000, 200S 



is greater than the synthesis by those from controls with- 
out MI (Helgadottir et al. 2004). 

In the present study, we attempted to replicate the 
association of ALOXSAP with stroke in a population 
outside Iceland. The SNPs defining HapA (SG13S25, 
-SG13S114, SG13S89, and SG13S32) and HapB 
(SG13S377, SG13S114, SG13S41, and SG13S3S) were 
genotyped for 450 Scottish patients who had experienced 
a stroke and for 710 controls. The patient and control 
cohorts have been described elsewhere (MacLeod et al. 
1999; Meiklejohn et al. 2001; Duthie et al. 2002; Whal- 
ley et al. 2004). In brief, 450 patients from northeastern 
Scotland with CT confirmation of ischemic stroke (in- 
cluding 26 patients with transient ischemic attack [TIA]) 
were recruited between 1997 and 1999, within 1 wk of 
admission to the Acute Stroke Unit at Aberdeen Royal 
Infirmary. Patients were further subclassified in accor- 
dance with the TOAST (Irial of Org 10172 in Acute 
Stroke Treatment) research criteria (Adams et al. 1993). 
Of the patients, 155 (34.4%) had large-vessel stroke, 96 
(21.3%). had cardiogenic stroke, and 109 (24.2%) had 
small-vessel stroke; for 5 (1.1%) of the patients, stroke 
with other determined etiology was diagnosed, 7 (1.6%) 
had more than one etiology, and 78 (17.3%) had un- 
known cause of stroke despite extensive evaluation. A 
total of 710 control individuals with no history of stroke 
or TIA were recruited during follow-up of the 1921 
(n = 227) and 1936 (n = 371) Aberdeen Birth Cohort 
Studies originally recruited in 1932 and 1947, respec- 
tively, as part of the Scottish mental surveys (Deary et 
al. 2004). A further 112 controls were recruited from 
local primary-care practices (Meiklejohn et al. 2001). 
Basic clinical characteristics of patients and control in- 
dividuals are shown in table 1. Approval for the study 
was granted by the local research ethics committee, and 
all study participants gave written informed consent. 

The haplotype analysis was performed using the pro- 
gram NEMO (Gretarsdottir et al. 2003). NEMO handles 
missing genotypes and uncertainty with phase through a 
likelihood procedure, by use of the expectation-maxi- 
mization algorithm as a computational tool to estimate 
haplotype frequencies. Since we were testing only two 
haplotypes, which had been shown elsewhere to confer 
risk of MI and stroke in an Icelandic cohort and MI in 
an English cohort, the reported P values are one sided. 
For the at-risk haplotypes, we calculated RR and popu- 
lation-attributable risk (PAR) under the assumption of 
a multiplicative model (Falk and Rubinstein 1987; Ter- 
williger and Ott 1992) in which the risk of the two alleles 
of haplotypes a person carries multiplies. 

The results of the haplotype-association analysis for 
HapA and HapB are shown in table 2. The haplotype 
frequencies of HapA in the Scottish populations (patient 
and control) were higher than in the corresponding Ice- 
landic populations (table 2). As demonstrated in the Ice- 



Table 1 

Clinical Characteristics of Scottish Patients 
and Control Individuals 





Patients 


Controls 


Characteristics 


(» = 450) 


(n = 710) 


Female: male 


42:58 


49:51 


Age (years) 


66.8 ±.6 


67.2 ±.4 


Hypertension (%) 


55.5 


/ 23.9 


Diabetes {%) 


12.6 


2.1 


Total cholesterol (mmol/liter) 


5.65 ±.06 


5.64 ±.05 



NOTE.— Patients and control individuals were classified 
as having hypertension and/or diabetes on the basis of 
previous history or receipt of antihypertensive or anti- 
diabetic therapy. Values with plus-minus symbol ( ± ) are 
mean ± SE. 



landic population, the estimated frequency of HapA was 
significantly greater in Scottish patients who have suf- 
fered a stroke than in Scottish controls. The carrier fre- 
quency of HapA in Scottish patients and controls was 
33.4% and 26.4%, respectively, which resulted in an RR 
of 1.36 (P = .007) and a corresponding PAR of 9.6%. 
We had previously observed in the Icelandic population 
a higher frequency of HapA in male than in female pa- 
tients with either stroke or MI (Helgadottir et al. 2004). 
This sex difference in the frequency of HapA was not 
observed in the Scottish population (table 2). 

We then tested the association of HapB with stroke 
in the Scottish cohort. HapB has been shown elsewhere 
to confer risk of MI in an English cohort (Helgadottir 
et al. 2004). A slight excess of HapB was observed in 
the patient group (6.8%) compared with controls (5.8%), 
but it was not significant (table 2). However, sex-specific 
analysis showed that the frequency of HapB was higher 
in males with ischemic stroke (9.2%) than in controls, 
resulting in an RR of 1.65 (P = .016). The frequency of 
HapB in females with ischemic stroke was 3.5%, which 
was lower but not significantly different from that of 
controls. The frequencies of HapB in males and females 
with ischemic stroke differed significantly (P = .0021). 
Interestingly, as shown in table 2, similar trends were 
observed in our Icelandic cohort; the frequency of HapB 
was greater in males with ischemic stroke (8.6%) than 
in females with ischemic stroke (5.8%), although this 
was not significant (P = .055). 

To summarize our results, we demonstrate in the pre- 
sent study that HapA, the risk haplotype of ALOXSAP, 
reported elsewhere to confer risk of MI and stroke in 
an Icelandic cohort, associates with ischemic stroke in 
a Scottish cohort. HapB, which confers risk of MI in an 
English cohort, was not associated with ischemic stroke 
in the Scottish cohort. However, we observed that HapB 
was overrepresented in male patients. 

Historical and archaeological data have suggested a 
Gaelic ancestry for both Icelanders and Scots. This is 



Reports 



000 



Table 2 

Analysis of Association of HapA and HapB with Ischemic Stroke 



HapA 



HapB 



Location and Study Population (») Frequency RR 



Frequency RR 



Scotland: . 

Controls (710) .142 

Patients with ischemic stroke (450 a ): .184 

Males (253) -183 

. Females (181) .179 

Iceland: 

Controls (624) .095 

Patients with ischemic stroke (632): .147 

Males (335) .155* 

Females (297) .138 



1.36 
1.35 
1.34 



i:63 

1.75 
1.51 



.007 
.023 
.044 



.00013 

.0002 

.0079 



.058 
.068 
.092 
.035 

.067 
.073 
.086 
.058 



1.20 
1.65 
.58 



1.09 
1.31 
.86 



NS 
.016 
NS 



NS 
NS 
NS 



Note.— Shown are HapA and HapB of ALOXSAP and the corresponding number of individuals 
genotyped, the haplotype frequency in the patient and control cohorts, the RR, and the one-sided P 
values HapA is defined by the SNPs SG13S25, SG13SU4, SG13S89, and SG13S32, with alleles G, 
T, G, and A, respectively, and HapB is defined by the SNPs SG13S377, SG13S114, SG13S41, and 
SGt3S3S, with alleles A, A, A, and G, respectively. For SNP genotyping, we used TaqMan assays 
(Applied Biosystems) or the fluorescent-polarization template-directed dye-terminator incorporation 
(the SNP-FP-TDI assay), as described elsewhere (Chen et al. 1999). SNP information can be found in 
the dbSNP database. The DNA used for the SNP genotyping was the product of whole-genome 
amplification, by use of the GenomiPhi Amplification kit (Amersham), of DNA isolated from the 
peripheral blood of the Scottish controls and patients with stroke. Data on the Icelandic cohort have 
been reported elsewhere (Helgadottir et al. 2004). NS = not significant. 

' Sex unknown for 16 patients. 



further supported by recent studies of mtDNA and Y- 
chromosome diallelic and microsatellite variation in Ice- 
landers, Scandinavians, and Gaels from Ireland and Scot- 
land (Helgason et al. 2000, 2001). Given this common 
ancestry, it is possible that the two populations share a 
disease-causing variant and that this variant may reside 
on the same common haplotype background (HapA). 
Such a scenario would be consistent with our results; 
although the estimated RR for HapA in the Scottish 
cohort is somewhat lower than in the Icelandic cohort, 
this difference is not statistically significant. Indeed, a 
similar observation has been made in previous studies 
of schizophrenia in Iceland and Scotland (Stefansson et 
al. 2003), in which the same extended haplotype was 
found to confer risk of schizophrenia in both popula- 
tions, with comparable frequencies in patient and con- 
trol groups in the two countries. 

The gene ALOXSAP encodes the membrane-associ- 
ated 5-lipoxygenase-activating protein (FLAP), ah impor- 
tant mediator of the activity of cellular 5-lipoxygenase 
(5-LO), which is a key enzyme in the biosynthesis of 
leukotrienes (Dixon et al. 1990; Miller et al. 1990). Leu- 
kotrienes are proinflammatory mediators produced pre- 
dominantly in inflammatory cells such as polymorpho- 
nuclear leukocytes, macrophages, and mast cells. Over 
the last decade, a number of studies have supported an 
important role for inflammation in atherosclerosis — from 
atheroma initiation to promotion of plaque rupture, 
thereby triggering thrombosis, the main atherosclerotic 
complication that causes MI and stroke (Libby 2002). 



The 5-LO pathway could be an important contributor 
to the pathophysiology of atherosclerosis through the 
formation of the proinflammatory LTB4 and/or through 
an increase in vascular permeability caused by cysteinyl 
leukotrienes. Indeed, we have shown increased produc- 
tion of LTB4 in neutrophils from patients with history 
of MI, compared with controls without history of MI 
(Helgadottir et al. 2004). This is further supported by 
recent human-expression studies (Spanbroek et a.1. 2003) 
that show an increased expression of members of the 5- 
LO pathway, including 5-LO and FLAP, in atheroscle- 
rotic lesions at various stages of their development. 
Moreover, a promoter variant of 5-LO (ALOXS [MIM 
152390]) has been shown to be associated with increased 
carotid artery intima-media thickness and with height- 
ened inflammatory biomarkers (Dwyer et al. 2004). In 
addition, an atherosclerotic mouse model with a hetero- 
zygous deficiency of 5-LO shows resistance to athero- 
sclerosis (Mehrabian et al. 2002), and an LTB4 receptor 
antagonist blocks the development of atherosclerosis in 
apoE- and LDLR-deficient mice (Aiello et al. 2002; 
Mehrabian et al. 2002). Together, these studies suggest 
that chronic upregulation of the leukotriene pathway 
may be harmful to the vasculature, in terms of athero- 
sclerosis progression and plaque instability. 

The precise mechanism by which the ALOXSAP vari- 
ants confer risk of MI and stroke is still unclear. As 
reported elsewhere, we have not observed SNPs in the 
coding sequence that led to amino acid substitution (Hel- 
gadottir et al. 2004). Therefore, one can speculate that 



000 



Am. J. Hum. Genet. 76:000-000, 2005 



unidentified variation in regulatory regions of the gene— 
that affects transcription, splicing, message stability, mes- 
sage transport, or translation efficiency — may underlie 
the risk conferred by ALOX5AR 

The results of the present study show that HapA as- 
sociates with ischemic stroke in a Scottish population, 
thereby providing replication of work that showed that 
the same haplotype confers increased risk of stroke in 
an Icelandic population. This replication constitutes ad- 
ditional evidence for the role of ALOX5AP in the patho- 
genesis of stroke. Identification of genetic risk factors for 
the common forms of stroke may facilitate identification 
of individuals at increased risk and may lead to novel 
strategies for the prevention and treatment of stroke. 

Acknowledgme nts 

The authors thank Thorbjorg Jonsdottir and personnel at 
the deCODE core facilities for valuable contributions. The 
authors acknowledge the support of the Chief Scientist's Of- 
fice, Scottish Executive. 

Electroni c-Database Information 

The URLs for data presented herein are as follows: 

dbSNP, http://www.ncbi.nlm.nih.gov/SNP/ 
Online Mendelian Inheritance in Man (OMIM), http://www 
.ncbi.nlm.nih.gpv/Omim/ (for ALOX5AP and ALOX5) 

References ' 

Aboderin I, Kalache A, Ben-Shlomo Y, Lynch JW, Yajnik CS, 
Kuh D, Yach D (2002) Life course perspectives on coronary 
heart disease, stroke arid diabetes: key issues and implica- 
tions for policy and research. World Health Organization, 
Geneva 

Adams HP Jr, Bendixen BH, Kappelle LJ, Biller J, Love BB, 
Gordon DL, Marsh EE 3rd (1993) Classification of subtype 
of acute ischemic stroke: definitions for use in a multicenter 
clinical trial. Stroke 24:35-41 

Aiello RJ, Bourassa PA, Lindsey S, Weng W, Freeman A, Show- 
ell HJ (2002) Leukotriene B4 receptor antagonism reduces 
monocytic foam cells in mice. Arterioscler Thromb Vase Biol 
22:443-449 

American Heart Association (2002) Heart disease and stroke 

statistics: 2003 update, Dallas 
Chen X, Levine L, Kwok PY (1999) Fluorescence polarization 

in homogeneous nucleic acid analysis. Genome Res 9:492- 

498 

Danesh J, Whincup P, Walker M, Lennon L, Thomson A, Ap- 
pleby P, Gallimore JR, Pepys MB (2000) Low grade inflam- 
mation and coronary heart disease: prospective study and up- 
dated meta-analyses. BMJ 321:199-204 

Deary IJ, Whiteman MC, Starr JM, Whalley LJ, Fox HC (2004) 
The impact of childhood intelligence on later life: following 



up the Scottish mental surveys of 1932 and 1947. J Pers Soc 
Psychol 86:130-147 
Di Napoli M, Papa F, Bocola V (2001) C-reactive protein in 
ischemic stroke: an independent prognostic factor. Stroke 32: 
917-924 

Dixon RA, Diehl RE, Opas E, Rands E, Vickers PJ, Evans JF, 
Gillard JW, Miller DK (1990) Requirement of a 5-lipoxy- 
genase-activating protein for leukotriene synthesis. Nature 
343:282-284 

Duthie SJ, Whalley LJ, Collins AR, Leaper S, Berger K, Deary 
IJ (2002) Homocysteine, B vitamin status, and cognitive func- 
tion in the elderly. Am J Clin Nutr 75:908-913 

Dwyer JH, Allayee H, Dwyer KM, Fan J, Wu H, Mar R, Lusis 
AJ, Mehrabian M (2004) Arachidonate 5 -lipoxygenase pro- 
moter genotype, dietary arachidonic acid, and atheroscle- 
rosis. N Engl J Med 350:29-37 

Falk CT, Rubinstein P (1987) Haplotype relative risks: an easy 
reliable way to construct a proper control sample for risk 
calculations. Ann Hum Genet 51:227-233 

Gretarsdottir S, Thorleifsson G, Reynisdottir ST, Manolescu 
A, Jonsdottir S, Jonsdottir T, Gudmundsdottir T, et al (2003) 
The gene encoding phosphodiesterase 4D confers risk of 
ischemic stroke. Nat Genet 35:131-138 

Helgadottir A, Manolescu A, Thorleifsson G, Gretarsdottir S, 
Jonsdottir H, Thorsteinsdottir U, Samani NJ, et al (2004) 
The gene encoding 5-lipoxygenase activating protein confers 
risk of myocardial infarction and stroke. Nat Genet 36:233- 
239 

Helgason A, Hickey E, Goodacre S, Bosnes V, Stefansson K, 
Ward R, Sykes B (2001) mtDNA and the islands of the North 
Atlantic: estimating the proportions of Norse and Gaelic 
ancestry. Am J Hum Genet 68:723-737 

Helgason A, Sigurdardottir S, Nicholson J, Sykes B, Hill EW, 
Bradley DG, Bosnes V, Gulcher JR, Ward R, Stefansson K 
(2000) Estimating Scandinavian and Gaelic ancestry in the 
male settlers of Iceland. Am J Hum Genet 67:697-717 

Libby P (2002) Inflammation in atherosclerosis. Nature 420: 
868-874 

Lusis AJ (2000) Atherosclerosis. Nature 407:233-241 
MacLeod MJ, Dahiyat MT, Cumming A, Meiklejohn D, Shaw 
D, St Clair D (1999) No association between glu/asp poly- 
morphism of NOS3 gene and ischemic stroke. Neurology 
53:418-420 

Mehrabian M, Allayee H, Wong J, Shi W, Wang XP, Shaposh- 
nik Z, Funk CD, Lusis AJ, Shih W (2002) Identification of 
5-lipoxygenase as a major gene contributing to atheroscle- 
rosis susceptibility in mice. Circ Res 91:120-126 

Meiklejohn DJ, Vickers MA, Dijkhuisen R, Greaves M (2001) 
Plasma homocysteine concentrations in the acute and con- 
valescent periods of atherothrombotic stroke. Stroke 32:57- 
62 

Miller DK, Gillard JW, Vickers PJ, Sadowski S, Leveille C, 
Mancini JA, Charleson P, Dixon RAF, Ford-Hutchinson AW, 
Fortin R, Gautier JY, Rodkey J, Rosen R, Rouzer C, Sigal 
IS, Strader CD, Evans JF (1990) Identification and isolation 
of a membrane protein necessary for leukotriene production. 
Nature 343:278-281 

Ridker PM, Buring JE, Shih J, Matias M, Hennekens CH (1998) 
Prospective study of C-reactive protein and the risk of future 



Reports 



000 



cardiovascular events among apparently healthy women. 
Circulation 98:731-733 
Ridker PM, Cushman M, Stampfer MJ, Tracy RP, Hennekens 
CH (1997) Inflammation, aspirin, and the risk of cardio- 
vascular disease in apparently healthy men., N Engl J Med 
336:973-979 

Spanbroek R, Grabner R, Lotzer K, Hildner M, Urbach A, 
Ruhling K, Moos MP, Kaiser B, Cohnert TU, Wahlers T, 
Zieske A, Plenz G, Robenek H, Salbach P, Kuhn H, Radmark 
O, Samuelsson B, Habenicht AJ (2003) Expanding expres- 
sion of the 5-lipoxygenase pathway within the arterial wall 
during human atherogenesis. Proc Natl Acad Sci USA 100: 
1238-1243 



Stefansson H, Sarginson J, Kong A, Yates P, Steinthorsdottir 
V, Gudfinnsson E, Gunnarsdottir S, Walker N, Petursson H, 
Crombie C, Ingason A, Gulcher JR, Stefansson K, St Clair 
D (2003) Association of neuregulin 1 with schizophrenia 
confirmed in a Scottish population. Am J Hum Genet 72: 
83-87 

Terwilliger JD, Ott J (1992) A haplotype-based "haplotype 
relative risk" approach to detecting allelic associations. Hum 
Hered 42:337-346 

Whalley LJ, Fox HC, Wahle KW, Starr JM, Deary IJ (2004) 
Cognitive aging, childhood intelligence, and the use of food 
supplements: possible involvement of n-3 fatty acids. Am J 
Clin Nutr 80:1650-1657 




European Journal of Human Genetics (2007) 15, 959-966 

© 2007 Nature Publishing Croup Ail rights reserved 1018-4813/07 $30.00 

www.naturexom/ejhg 



ALOX5AP gene variants and risk of coronary artery 
disease: an angiography-based study 

Domenico Girelli*' 1 , Nicola Martinelli 1 , Elisabetta Trabetti 2 , Oliviero Olivieri 1 , 

Ugo Cavallari 2 , Giovanni Malerba 2 , Fabiana Busti 1 , Simonetta Friso 1 , Francesca Pizzolo , 

Pier Franco Pignatti 2 and Roberto Corrocher 1 

department of Clinical and Experimental Medicine, University of Verona, Verona, Italy; department of Mother and 
Child and Biology-Genetics, University of Verona, Verona, Italy 

The aim of this study was to explore the role of variants of the gene encoding arachidonate 5- 
lipoxygenase-activating protein (ALOXSAP) as possible susceptibility factors for coronary artery disease 
(CAD) and myocardial infarction (Ml) in patients with or without angiographically proven CAD. A total of 
1431 patients with or without angiographically documented CAD were examined simultaneously for seven 
ALOXSAP single-nucleotide polymorphisms, allowing reconstruction of the at-risk haplotypes (HapA and 
HapB) previously identified in the Icelandic and British populations. Using a haplotype-based approach, 
HapA was not associated with either CAD or Ml. On the other hand, HapB and another haplotype within 
the same region (that we named HapC) were significantly more represented in CAD versus CAD-free 
patients, and these associations remained significant after adjustment for traditional cardiovascular risk 
factors by logistic regression (HapB: odds ratio (OR) 1.67, 95% confidence interval (CI) 1.04-2.67; 
P = 0.032; HapC: OR 2.41, 95% CI 1.09-5.32; P= 0.030). No difference in haplotype distributions was 
observed' between CAD subjects with or without a previously documented Ml. Our angiography-based 
study suggests a possible modest role of ALOXSAP in the development of the atheroma rather than in its 
late thrombotic complications such as Ml. 

European Journal of Human Genetics (2007) 15, 959-966; doi:10.1038/sj.ejhg.5201854; published online 16 May 2007 



Keywords: ALOX5AP; coronary artery disease; myocardial infarction 



Introduction 

Interest in unraveling the genetic basis of coronary artery 
disease (CAD) has been recently renewed by results 
obtained applying powerful approaches such as genome- 
wide scan studies. 1 " 3 At variance with classic association 
studies involving single-nucleotide polymorphisms (SNPs) 
in candidate genes, genome-wide scan studies have the 
advantage of discovering new gene(s), without a priori 



♦Correspondence: Professor D Girelli, Department of Clinical and 
Experimental Medicine, University of Verona, Poiiclinico CB Rossi, 
37134 Verona, Italy. Tel: +39 45 8124403; Fax: +39 45 580111; 
E-mail: domenico.girel1i@univr.it 

Received 27 October 2006; revised 9 March 2007; accepted 17 April 2007; 
published online 16 May 2007 



hypothesis. A paradigm of the successful use of such 
strategies was the identification of arachidonate 5-lipoxy- 
genase-activating protein (ALOXSAP) as a susceptibility 
gene for myocardial infarction (MI) and stroke. 4 Interest- 
ingly, ALOXSAP encodes the 5-lipoxygenase-activating 
protein (FLAP), which is an essential regulator of the 
biosynthesis of the leukotriene A4 (LTA4). 5 ' 6 Indeed, the 
5-lipoxygenase (5-LO)/leukotriene pathway has been inde- 
pendently implicated in the pathogenesis of atherosclero- 
sis in humans 7,8 and mice 9 (reviewed by Zhao and Funk 10 ). 
While not successful in discovering causal variants in 
ALOXSAP, the original study by Helgadottir et al 4 identified 
a 4-SNP haplotype, named HapA, as a risk factor for MI 
and stroke in the Icelandic population. The Authors were 
unable to confirm the result in a cohort of British patients 



AL0X5AP gene variants and coronary artery disease 

D Cirelli et al 



960 

with MI; however, in such cohort they reported an 
association of another 4-SNP haplotype, named HapB, 
with MI risk. 4 Few subsequent studies on ALOX5AP in 
different populations yielded conflicting results. Lohmus- 
saar et al 11 studied Central European patients with stroke, 
finding a significant association for several ALOX5AP SNPs, 
including one that was part of HapA. On the other hand, 
studies in North Americans failed to show a significant 
association with either stroke or MI. 1213 

To date, no genetic -epidemiological data are available 
for populations from Southern Europe. Moreover, none of 
the previous studies specifically attempted to dissect the 
role of ALOXSAP in the atherosclerosis phenotype rather 
than in its 'complication' phenotype (MI). We therefore 
evaluated simultaneously seven ALOXAS SNPs and their 
reconstructed haplotypes as possible risk determinants for 
CAD and MI within the framework of an Italian population 
with or without angiographically confirmed CAD. 



Materials and methods 
Study population 

The Verona Heart Project is an ongoing study aimed to 
identify new risk factors for CAD and MI in a population of 
subjects with angiographic documentation of their cor- 
onary vessels. Details about the enrolment criteria have 
been described elsewhere. 14 In the present study, we 
examined data from a total of 1431 subjects, for whom 
complete analyses of seven ALOXSAP SNPs (see below) 
were available. Of these subjects, 1047 had angiographi- 
cally documented severe coronary atherosclerosis (CAD 
group), the majority of them being candidates for coronary 
artery bypass grafting or percutaneous coronary interven- 
tion. The disease severity was evaluated by counting the 
number of major epicardial coronary arteries (left anterior 
descending, circumflex, and right) affected with ^1 
significant stenosis (^50%). On the other hand, 384 
subjects had completely normal coronary arteries, being 
submitted to coronary angiography for reasons other than 
CAD, mainly valvular heart disease (CAD-free group). 
Controls were also required to have neither history nor 
clinical or instrumental evidence of atherosclerosis in 
vascular districts beyond the coronary bed. Since the 
primary aim of our selection was to provide an objective 
and clear-cut definition of the atherosclerotic phenotype, 
subjects with nonsignificant coronary stenosis (ie <50%) 
were not included in the study. CAD subjects were 
classified into MI and non-MI subgroups by combining 
data from history with a thorough review of medical 
records showing diagnostic electrocardiogram and enzyme 
changes, and/or the typical sequelae of MI on ventricular 
angiography. An appropriate documentation was obtained 
for 1046/1047 (99.9%) CAD patients: from those 624 
subjects had a history of previous Ml, whereas the 
remaining 422 subjects had no history of MI. The 



angiograms were assessed by two cardiologists unaware 
that the patients were to be included in the study. Samples 
of venous blood were drawn from each subject after 
an dvernight fast. Serum lipids and the other routine 
biochemical parameters were determined as described 
previously. 14 At the time of blood sampling, a complete 
clinical history was collected, including the assessment 
of cardiovascular risk factors such as obesity, smoking, 
hypertension, and diabetes. 

The study was approved by our local Ethical Committee. 
Informed consent was obtained from all the patients after a 
full explanation of the study. 

Genotyping 

To make possible comparison with studies in other 
populations, we selected seven previously described 
ALOXSAP (GenelD: 241; chromosome: 13ql2) SNPs 
(SG13S25, SG13S377, SG13S114, SG13S89, SG13S32, 
SG13S41 and SG13S35), maintaining their original no- 
menclature, 4 as well as the nomenclature of the recon- 
structed haplotypes. The seven SNPs were initially tested by 
PCR and restriction analyses (Supplementary Table 1) in a 
small group of randomly chosen DNA samples in order to 
verify the heterozygosity in the study population. All the 
samples were then genotyped in two multiplex reactions 
for six SNPs (SG13S377, SG13S41, SG13S32, and SG13S114 
in multiplex one, Ml; SG13S25, SG13S35 in multiplex two, 
M2) using LightCycler™ real-time PCR technology based 
on fluorescence resonance energy transfer and melting 
point analysis. The sequences of primers and probes used 
for the six SNPs genotyping with melting point analysis 
are shown in Supplementary Table 2. Both primers and 
fluorescently labelled probes were synthesized by Sigma- 
Proligo (Proligo France SAS). PCR and melting curve 
analysis was performed in 20^1 volumes in glass capillaries 
(Hoffmann-La Roche). PCR conditions for Ml and M2 are 
detailed in Supplementary Tables 3 and 4, respectively. 
Cycling and melting curve analysis conditions were 
different for the two multiplex reactions, as given in 
Supplementary Table 5. As the SG13S89 polymorphism 
was not easily detectable in a multiplex reaction, it was 
genotyped by PCR and restriction analysis for all the 
samples, using the following primers forward (F): 5'- 
AAGTGCATCTCAAGGAGGT-3' and reverse (R) 5 ; -ATTAG 
CAGAAGAGCCAAGT-3'. 

Statistical analysis 

The analyses were performed mainly with SSPS 13.0 
statistical package (SPSS Inc., Chicago, IL, USA). Distribu- 
tions of continuous variables in groups were expressed as 
means ±SD. Quantitative data were assessed using the 
Student's r-test. Associations between qualitative variables 
were analysed with the y} test or Fisher exact test, when 
indicated. Allele and genotype frequencies among cases 
and controls were compared with values predicted by 



European Journal of Human Genetics 



AL0X5AP gene variants and coronary artery disease 

D Cirelli et ol 



961 



Hardy-Weinberg equilibriurrTusing x 2 test. To assess the 
association with CAD or MI, relative risks associated with 
each genotype were calculated separately by . univariate 
logistic regression and then by multiple logistic regression 
adjusted for the traditional cardiovascular risk factors (ie 
sex, age, smoking, hypertension, diabetes, total cholesterol, 
and triglycerides), assuming an additive, dominant or 
recessive mode of inheritance. 

Pairwise linkage disequilibrium was examined as de- 
scribed by Devlin and Risen. 15 Haplotype frequencies 
were estimated using R software with haplo.stats package 
(R Foundation for Statistical Computing, Vienna, Austria; 
ISBN 3-900051-07-0, URL: http://www.R-project.org). 16 
The upper and lower bounds of the 95% confidence 
interval (CI) were calculated by simulating 1000 random 
samples from a population having the haplotype frequen- 
cies estimated on the entire sample set. The U measure was 
calculated for each simulated sample. The upper and lower 
bounds represent the quantiles corresponding to the 0.025 
or 0.975 probabilities of the D distribution. Haplotype 
blocks were defined as proposed by Gabriel et al} 7 The 
relationship between haplotypes and clinical outcomes 
was examined using a generalized linear model regression 
of a trait on haplotype effects, allowing for ambiguous 
haplotypes (haplo.glm function), 16 and adjusting for the 
above-mentioned risk factors. Randomization test by 
permuting the cases and controls was performed by means 
of Monte Carlo method to confirm the results. Haplotypes 
present in less than 10 individuals were not considered in 
the analyses. 

The study power was assessed by means of the Altman 
nomogram, after adjustment for the asymmetric distribu- 
tion of population subgroups (CAD-free versus CAD; non- 
Mi versus MI). The study has adequate power (>90%) to 
replicate the findings for odds ratios (ORs) greater than 2.0, 
which is consistent with those observed in the previous 
studies. 4 For each OR, 95% CIs were calculated. A value of 
two-tailed P<0.05 was considered significant. 



Results 

Table la summarizes the clinical characteristics of the 
study population stratified according to the presence 
(CAD) or absence (CAD-free) of angiographically docu- 
mented CAD. As expected, traditional cardiovascular risk 
factors were more represented in the CAD group. The 
characteristics of the CAD population, divided in two 
subgroups according to the presence or absence of a 
previous documented MI, are reported in Table lb. The 
genotype frequencies for the polymorphisms tested were 
in Hardy-Weinberg equilibrium both in the CAD and 
CAD-free groups. 

Allele and genotype distributions were similar either 
between CAD and CAD-free groups (Table 2a) or within 



Table la Clinical characteristics of the study population 
stratified according to absence (CAD-free) or presence 
(CAD) of angiographically documented CAD 



Age (years) 
Males (%) 
BMI (kg/m 2 ) 
Hypertension (%) 
Smoking (%) 
Diabetes (%) . 
Total cholesterol 
(mmol/l) 

Triglycerides (mmol/l) 



CAD-free 


CAD 


(r\ = 384) 


(n=1047) 


58.7 + 12.3 


61.2 + 9.8 


65.6 


79.8 


25.4 + 3.5 


26.8 + 3.5 


40.5 


66.3 


43.9 


67.9 


6.8 


19.2 


5.47±1.10 


5.54 ±1.1 7 


1.49 ±0.67 


1.91 ±0.99 



P-values 

< 0.001* 

< 0.001* 
0.936* 

<0.001* 

< 0.001* 

< 0.001* 
0.322* 

< 0.001* 



*By f-test; *by x 2 test 

Table lb Clinical characteristics of the CAD patients, 
with (Ml) or without (no-MI) a previous documented Ml 



No-M/ 
(r\ = 422) 



Ml (n = 624) P-values 



Age (years) 62.5 ±8.9 

Males (%) 75.6 

BMI (kg/m 2 ) 27.0 ±3.5 

Hypertension (%) 72.3 

Smoking (%) 62.3 

Diabetes (%) 19.3 
Total cholesterol (mmol/l) 5.6± 1 .1 

Triglycerides (mmol/l) 1 .9 ± 0.9 

GAD severity 

One vessel 24.3 

Two vessels 26.0 

Three vessels 48.1 
Left main coronary artery 1 .7 



60.4 ±10.2 

82.7 
26.6±3.6 
62.1 
71.7 
19.2 
5'.5 ±1.2 
1.9 f 1.0 

12.4 
24.1 
61.8 
1.7 



0.002* 
0.005* 
0.583* 
0.001* 
0.002* 
0.592* 
0.677* 
0.396* 



< 0.001* 



*By r-test; *by x 2 test. 



CAD subjects with or without a previous MI (Table 2b). 
Results from the regression analyses, assuming additive, 
dominant or recessive mode of inheritance, showed no 
significant association of the gene variants tested with the 
clinical outcomes (data not shown). In general, the SNPs 
tested were in linkage disequilibrium, as shown in Table 3. 

Considering haplotype analysis, the most frequent 
haplotypes were G-T-G-C and G-T-A-G for HapA SNPs and 
HapB/C SNPs, respectively, and thus were used as the 
referents. The haplotype distributions for HapA SNPs were 
similar between CAD and CAD-free subjects (F = 0.937). On 
the other hand, the haplotype distributions for HapB SNPs 
were significantly different between CAD and CAD-free 
subjects (P = 0.014), as shown in Table 4a. More precisely, 
two haplotypes A-A-A-G (HapB) and G-T-A-A (that we 
named HapC) were more represented in CAD group (7.5 
versus 5.5% and 3.7 versus 1.6%, respectively), and these 
associations remained significant also after adjustment for 



European Journal of Human Genetics 



^ ALOXSAP qene variants and coronary artery disease 

(flft . D Girelli et al 

962 — 

Table 2a ALOX5AP genotype and allele distribution in the 
study population stratified according to absence (CAD- 
free) or presence (CAD) of angiographically documented 
CAD 



ALOXSAP 
genotype, % 

SC13S25 
GC 
CA 
AA 

G allele 
A allele 

SG13S377 
GG 
GA 
AA 

G allele 
A allele 

SG13S1W 
TT 
TA 
AA 

T allele 
A allele 

SG13S89 
GG 
GA 
AA 

G allele 
A allele 

SG13S32 
AA 
AC 
CC 

C allele 
A allele 



CAD-free (n = 384) CAD (n = 1 047) P-values* 



85.7 
13.8 

0.5 
92.6 

7.4 



76.8 
20.8 
2.3 
87.2 
12.8 



42.4 
41.7 
15.9 
63.3 
36.7 



86.7 
12.8 

0.5 
93.1 

6.9 



22.9 
51.6 
25.5 
51.3 
48.7 



SGI3S41 




AA 


81.8 


AG 


17.4 


GG 


0.8 


A allele 


90.5 


G allele 


9.5 



SG13S35 
GG 
GA 
AA 

G allele 
A allele 



83.9 
15.6 

0.5 
91.7 

8.3 



84.6 
14.8 

0.6 
92.0 

8.0 



73.9 
24.6 
1.4 
86.2 
13.8 



40.6 
44.8 
14.6 
63.0 
37.0 



87.5 
12.1 

0.4 
93.6 

6.4 



22.6 
49.9 
27.5 
52.4 
47.6 



81 .9 
17.3 

0.8 
90.6 

9.4 



81 .5 
18.1 

0.4 
90.5 

9.5 



0.885 
0.682 

0.180 

0:530 

0.559 
0.921 

0.794 
0.727 

0.747 
0.620 

0.997 
0.995 

0.528 
0.396 



*By x 2 test or Fisher's exact test. 



traditional cardiovascular risk factors, that is, sex, age, 
smoking, hypertension, diabetes, total cholesterol, and 
triglycerides (Table 4b). The significance of the general 
model, including genetic factors arranged as haplotypes, 
was confirmed after randomization test (P = 0.022 for 
general model, P = 0.013 for HapB and P = 0.021 for HapC, 
after 1000 permutations). 



Table 2b ALOXSAP genotype and allele distribution in 
the CAD group stratified according to absence (no-MI) or 
presence (Ml) of previously documented Mi 

ALOXSAP genotype No-MI (n = 422) Ml (n = 624) P-values* 



SG13S2S 
GG 
GA 
AA 

G allele 
A allele 

SG13S377 
GG 
GA 
AA 

G allele 
A allele 

SG13S114 
TT 
TA 
AA 

T allele 
A allele 

SG13S89 
GG 
GA 
AA 

G allele 
A allele 

SG13S32 
AA 
AC 
CC 

C allele 
A allele 

SG7354J 
AA 
AG 
GG 

A allele 
G allele 

SG13S3S 
GG 
GA 
AA 

G allele 
A allele 



84.6 
14.2 

1.2 
91.7 

8.3 



72.0 
26.3 
1.7 
85.2 
14.8 



37.7 
47.4 
14.9 
61 .4 
38.6 



87.4 
12.3 

0.2 
93.6 

6.4 



23.9 
49.1 
27.0 
51.5 
48.5 



83.4 
16.1 

0.5 
91.5 

8.5 



81.3 
18.7 

0 
90.6 

9.4 



84.6 
15.2 

0.2 
92.2 

7.8 



75.2 
23.6 
1.3 
87.0 
13.0 



42.6 
42.9 
14.4 
64.1 
35.9 



87.7 
11.9 

0.5 
93.6 

6.4 



21.8 
50.3 
27.9 
53.0 
47.0 



81.1 
17.9 

1.0 
90.1 

9.9 



81.7 
17.6 

0.6 
90.5 

9.5 



0.107 
0.727 

0.509 
0.283 

0.263 
0.222 

0.868 
0.936 

0.719 
0.528 

0.515 
0.315 

0.282 
0.997 



*By x 2 test or Fisher's exact test. 



There was no difference in haplotype distributions 
between CAD subjects with or without a previous MI, 
either for HapA region or for HapB region (Table 4c). 



Discussion 

The present investigation in Italian patients provides some 
evidence that the ALOXSAP gene might play a role in 



European Journal of Human Genetics 



AL0X5AP gene variants and coronary artery disease 

D Cirelli et al 



Table 3 Pairwise linkage disequilibrium analysis 



963 



SG1 3S25 

SG13S377 

SG13S114 

SC1 3S89 

SGI 3S32 

SC13S41 

SG13S35 



SC13S2S 


SG13S377 


SG13S114 


SC13S89 


SC13S32 


SG13S41 


SG13S35 




0.013 


0.045 


0.006 


0.087 


0.007 


0.002 


1.000 




0.185 


0.003 .. 


0.113 


0.007 


0.145 


0.956 


0.834 




0.072 


0.038 


0.105 


0.026 


1.000 


0.544 


0.774 




0.050 


0.463 


0.004 


0.969 


0.815 


0.245 


0.882 




0.093 


0.077 


0.920 


0.666 


0.770 


0.828 


0.987 




0.009 


0.529 


0.473 


0.392 


0.790 


0.836 


0.914 





jy, Lewontin normalized value; R, correlation coefficient. 



conferring susceptibility to CAD also in South European 
populations. Nonetheless, since the statistically significant 
association we found was relatively weak, the role of 
this gene, if any, seems modest. To put our results into a 
more general perspective, we propose the following 
considerations. 

Comparison with previous studies 
The landmark study by Helgadottir et al identified two 
different haplotypes as CAD risk factors in populations of 
different ancestry. According to the haplotype block 
definition proposed by Gabriel et a/, 17 we observed three 
haplotype blocks of two SNPs each (block 1: SG13S25- 
SG12S377; block 2: SG13S32-SG13S41; block 3: SG13S41- 
SG13S35). Therefore, the SNPs describing HapA or HapB/C 
do not define a single haplotype block. This finding is 
consistent with what observed by Helgadottir et al. 4 

In Icelandics (a genetic isolate), the HapA conferred a 
nearly twofold risk of MI. 4 This was not confirmed in 
British patients, in whom, on the other hand, a different 4- 
SNPs haplotype (HapB) was associated with Ml. Neither 
HapA nor HapB was associated to incident MI in a cohort 
of male US physicians. 13 To make possible a comparison, 
we focused on a standardized set of seven ALOXSAP SNPs, 
allowing reconstruction of the same at-risk haplotypes 
reported in the literature. With respect to HapA, our results 
suggest that this haplotype may not be informative for 
risk assessment of CAD in non-Icelandic populations. With 
respect to HapB, a modest contribution of this genetic 
marker to CAD risk was observed. Haplotype analyses 
revealed in our population a nominally significant associa- 
tion between CAD and another ALOXSAP haplotype 
('HapC'), unremarkable in previous studies. Considering 
also the low frequency of this haplotype, the relevance of 
this finding remains uncertain. The observed differences 
among populations are not surprising, and may relate in 
part to population-specific differences in allele and haplo- 
type frequencies (for a summary of previous studies see 
Table 5). For example, the frequency of HapA in Icelandic 
controls (9.5%) was well below that observed in North 
American (15%), German (15.2%), and our Italian (18.6%) 
populations. Moreover, it has to be underscored that we are 



dealing with disease-risk-associated haplotypes made of 
SNPs with no obvious potential effects on function, whose 
association(s) with yet unidentified causal variant(s) in 
ALOXSAP may differ between populations with differing 
genealogies. In other words, it would not be unexpected to 
find in the future different pathogenic ALOXSAP muta- 
tion^), with different frequencies among populations, 
arising on different haplotype background. Noteworthy, 
a replication study in a Japanese population 18 found an 
allele frequency of HapA/HapB SNPs too low to conduct 
meaningful association. Nevertheless, in that population 
haplotypes constructed on the basis of two other intronic 
SNPs were significantly associated with Ml. 

ALOX5AP, leukotriene pathway, and CAD 
pathogenesis 

Preliminary functional data by Helgadottir et al indicated 
that some at-risk haplotypes were associated to increased 
neutrophil release of leukotriene B4 (LTB4). Being LTB4 
synthesized from LTA4, it implies that ALOXSAP variants 
might determine proinflammatory gain of functions. The 
role of inflammation in CAD pathogenesis is now well- 
established (reviewed by Hansson 19 ). The FLAP protein 
encoded by ALOXSAP has an important role in the initial 
steps of the biosynthesis of leukotrienes, 5 ' 6 which in turn 
have a variety of proinflammatory effects. 20 Besides the 
ALOXSAP story, genetic evidence for the involvement of 
the 5-LO/leukotriene pathway in CAD is accumulating. 21 " 23 
The same Icelandic group recently reported that another 
gene involved in this pathway, that is, leukotriene A4 
hydrolase, conferred risk of CAD, especially in African 
Americans. 21 Dwyer et al 22 found an association between 
promoter variations of the ALOXS gene (encoding 5-LO, 
ie the FLAP target) and carotid intima-media thickness 
(a preclinical surrogate marker of atherosclerosis). As a 
functional counterpart of intriguing genetic studies, a bulk 
of animal experiments have linked the 5-LO pathway to 
atherosclerosis, although results are sometimes discordant 
(critically reviewed by Funk 23 ). Interestingly, many of basic 
researches leading to the 'lipoxygenases hypothesis' 24 ' 25 
points towards an involvement in early events of atheroma 
development, through LTB4-mediated migration and 



European journal of Human Genetics 



©ALOXSAP gene variants and coronary artery disease 
D Girelli et al 

964 ~ ~" 

Table 4 ALOX5AP haplotype distribution in the study 
population stratified according to the presence or absence 
of angiographically documented CAD (a); ORs with 95% 
CIs for CAD for HapB region haplotypes, calculated by 
means of haplotype-based logistic regression analysis 
adjusted for traditional risk factors for CAD, that is, sex, 
age, smoking, hypertension, diabetes, and plasma lipids 
(b); ALOX5AP haplotype distribution in the CAD group 
stratified according to absence (no-MI) or presence (Ml) of 
previously documented myocardial infarction (c) 



ALOXSAP haplotype 



CAD-free (%) CAD (%) P-values* 



HapA SNPs (SG13S25, SG13S114, SG13S89, SG13S32) 

G-T-G-A (HapA) a 18.6 16.9 0.937 

G-T-G-C 35.8 37.5 

G-A-G-A 22.7 22.3 

C-A-G-C 8.6 8.8 

G-A-A-C 5.5 5.5 

A-T-G-A 7.4 7.8 

HapB/C SNPs (SGI 35377, SG13S114, SG13S41, SG13S35) 



G-T-A-G 
G-T-A-A (HapC) a 
G-T-C-G 
G-A-A-G 
G-A-G-G 
A-T-A-C 

A-A-A-G (HapB) a 
A-A-A-A 

ft) 

HapB/C SNPs 



58.3 


56.8 


1.6 


3.7 


1.4 


1.1 


17.1 


16.0 


7.3 


7.7 


1.5 


1.0 


5.5 


7.5 


4.8 


4.6 



0.014 



OR for CAD 



P-values* 



P-values* 



G-T-A-A (HapC) a 2.41 (1.09-5.32) 0.030 0.021 

A-A-A-G (HapB) a 1.67(1.04-2.67) 0.032 0.013 



(c) 

ALOXSAP haplotype 



No-MI (%) 



Ml(%) 



P-values* 



HapA SNPs (SG13S25, SG13S114, SG13S89, SG13S32) 

G-T-G-A (HapA) a 15.7 17.6 0.587 

G-T-G-C 36.6 38.2 

G-A-G-A 24.1 21.2 

G-A-G-C 8.9 8.8 

G-A-A-C 5.7 5.4 

A-T-G-A 8.3 7.5 

HapB/C SNPs (SGI 3S377, SG13S114, SG13S41, SG13S35) 



G-T-A-G 
G-T-A-A (HapC) a 
G-T-G-G 
G-A-A-G 
G-A-G-G 
A-T-A-G 

A-A-A-G (HapB) a 
A-A-A-A 



55.6 


57.5 


3.5 


3.9 


0.9 


1.3 


17.2 


15.2 


6.9 


8.3 


1.0 


1.1 


8.4 


7.0 


4.5 


4.7 



0.547 



*By regression analysis. 

*By regression analysis adjusted for sex, age, smoking, hypertension, 

diabetes, total cholesterol, and triglycerides. 

A By randomization test after 1000 permutations. 

a HapA is defined by SG13S25, SG13S114, SG13S89, and SG13S32 

SNPs, with alleles G, T, G, A, respectively. HapB and C are defined by 

SG13S377, SG13S114, SG13S41, and SGI 3S35 SNPs, with alleles A, 

A, A, G, or G, T, A, A, respectively. 

Bold characters underscore the haplotypes with a significant different 
distribution. 



Table 5 Frequencies of HapA, HapB, and some ALOXSAP 
SNPs in studies published so far (in patients with Ml or 
stroke) and in our study 



Study (author's 
name) 


Controls 
(%) 


Cases (%) 


P-values 


Helgadottir et al, A 








Icelandic cohort 








HapA 


9.5 


15.8 a , 


<0.001*, 




14.9 b 


<0.001 b 


SG13S114 allele T 


OJ.O 


7ft ft a 

/u.u 


n n?i 


British cohort 








HapA 


16.8 


15.1 a 


NS 


HapB 


4.0 


7.5 a 


< 0.001 


Lohmussaar et a/, 11 








German cohort 








HapA 


15.2 


14.5 b 


NS 


SGI 3S25 allele G 


90.1 


89.4 b 


NS 


SG13S114 allele T 


65.0 


68.5 b 


0.02S ■ 


SGI 3S89 allele G . 


96.0 


94.7 b 


NS 


SG13S32 allele A 


46.7 


46.9 b 


NS 


Nolnnrinttir of nl^ 








jcoiusn cunun 








Han A 


14 y 


1 8.4 b 


0.007 


HapB 


5.8 


6.8 b 


NS 


Kniimntn Pt nl * ® 
nUjlfilULU ct til, 








Japanese cohort 






0.SS7 


SG1 3S25 allele G 


99.97 


100 a 


SG1 3S377 allele G 


81.6 


80.0 a 


0.243 


SCI 3 S1 14 allele T 

JVJ | j j l i » u/fcrc i 


64.7 


64.1 a 


0.298 


^r.l oIIpIp G 

JVJ l JJ07 liffCJC VJ 


99.2 


99.0 a 


0.603 


v,i 3^3? niipip a 

JVJ 1 jjj^ UffCfc Al 


64.9 


65. l a 


0.428 


SC.I 1S41 oIIpIp a 

j vj i j j*t i untie n 


99.2 


98!7 a 


0.303 


SCI IS^S allplp G 

JVJ 1 JJJJ UIICIC VJ 


100 


100 a 




A162C allele C 


48.8 


44.7 a 


0.129 


T8733A allele A 


43.6 


42i6 a 


0.S70 


Haplotype 


20.0 


25.8 a 


0.003 


162A-8733A 








Haplotype 


23.6 


16.9 a 


0.001 


162C-8733A 








Meschiaet o/, 12 








North American 








cohort 








SG13S25 allele G 


87.9 


89.7 b 


0.200 


SG13S114 allele T 


57.8 


59.1 b 


0.180 


SGI 3S89 allele G 


87.4 


91 .2 b 


0.1 SO 


SGI 3S32 allele A 


49.2 


51. 3 b 


0.790 



Zee et a/, 13 
US cohort 
HapA 
HapB 

SG13S25 o//e/eG 
SG13S377 allele G 
SG13S114 allele T 
SG1 3S89 allele G 
SGI 3S32 allele A 
SGI 3S41 allele A 
SG13S35 allele G 

This study, 2006 
Italian cohort 
HapA 



14 c , 15 d 
7 C , 7 d 
90 c , 1 d 
87 c , 83 d 
68 c , 63 d 
95 c , 94 d 
46 c , 52 d 
91 c , 92 d 
91 c , 89 d 



18.6 



17 a , 18 b 

6 a , 8 b 
90 a , 90 b 
88 a , 87 b 
68 a , 63 b 
94 a , 94 b 
50 a , 52 b 
91 a , 92 b 
93 a , 91 b 



16.9 e 



0.460*, 
0.080*, 
0.890*, 
0.410*, 
0.630*, 
0.840*, 
0.1 S0* t 
0.730*, 
0.210*, 



0.710* 
0.470* 
0.470* 
0.1 S0 b 
0.990* 
0.960* 
0.990* 
0.680* 
0.260* 



0.937 



European Journal of Human Genetics 



AL0X5AP gene variants and coronary artery disease 

D Cirelli et al 



Table 5 (Continued) 



JLUUy yUUtllUI J 

name) 


Controls 

<%) 


Cases (%) 


P-values 


HapB 


5.5 


7.5 e 


0.014 


SG1 3S25 allele G 


92.6 


92.0 e 


0.682 


SG13S377 allele C 


87.2 


86.2 e 


0.530 


SG13S114 allele T 


63.3 


63.0 e 


0.921 


SG13S89o//e/eC 


93.1 


93.6 e 


0.727 


SG13S32 allele A 


48.7 


47.6 e 


0.620 


SG13S41 allele A 


90.5 


90.6 e 


0.995 


SG13S35 allele C 


91 .7 


90.5 e 


0.396 



NS, nonsignificant. 
a Myocardial infraction. 
b Stroke. 

c Control group for myocardial infraction. 
d Control group for stroke. 
e Coronary artery disease. 

Italic numbers indicate the characteristics of case and control groups. 



activation of monocyte/macrophages, as well as lipoxy- 
genases-mediated LDL oxidation. 

Peculiarities of the present study: strengths and 
limitations 

Previous studies on ALOX5AP focused on MI patients versus 
controls selected from the general population or from 
event-free subjects such as in the prospective Physician's 
Health Study cohort. 413 Ml is usually a late thrombotic 
complication superimposed on coronary atherosclerotic 
plaque rupture, 26 so that design of previous studies did 
not directly allow to separate a putative specific role of 
ALOXSAP in MI rather than in CAD development. Our 
alternative experimental design focused on subjects with 
angiographically proven CAD, with or without a previous 
documented MI. Moreover, the angiography-based design 
enabled us to select CAD-free subjects with an objectively 
denned control status, a critical issue in genetic association 
studies. 27 This allowed us to overcome the caveat, common 
in Western general populations where atherosclerosis is 
endemic, of enrolling controls with substantial coronary 
atherosclerotic lesions, although not yet clinically evident. 
While our CAD-free subjects cannot be considered a 
'typical' control group, we feel confident about their 
acceptable representativity of the background general 
population, being their genotype and haplotype distribu- 
tions, not fundamentally different from those observed 
in controls from German and US populations (see above). 
Since we noted haplotype. differences only between the 
whole CAD group versus the CAD-free group, and not 
between CAD patients with or without MI, our data appear 
to be consistent with a more relevant role of ALOX5AP in 
atherogenesis rather than in thrombogenesis, according to 
many of the above-mentioned biochemical data. 

This study suffers of common limitations of genetic 
association studies with complex traits. 28 Despite the 
unbalance between case and controls, it was sufficiently 



powered to detect a predefined effect of ALOXSAP 
haplotypes on CAD (see above). On the other hand, we 
could riot properly analyse some interesting issues such as 
a possible stronger effect of ALOXSAP in males than in 
females, 4 because of the limited number of women 
enrolled. 

Finally, from a possible practical perspective it has to be 
taken into account the relatively poor frequency of 'at-risk' 
haplotypes in our population, as well as their modest effect 
on the CAD risk. 



Conclusions 

ALOXSAP represents the paradigm of a new class of 
promising genes identified by powerful genome-wide 
investigations, which is currently an object of intense 
investigations to confirm their role in CAD susceptibility. 
Our data neither refute nor strongly support this hypo- 
thesis. Adding them to current knowledge, some evidence 
on ALOXSAP as a genetic susceptibility factor for CAD has 
now emerged in four out of five independent populations 
(Icelandic, British, Japanese, and Italian; but not in North 
America). Our angiography-based study suggests a possible 
role of ALOXSAP/TLAP in the development of the atheroma 
rather than in its late thrombotic complications such as 
MI. Such a role, if any, appears to be modest. Much further 
work is needed to understand the reason(s) for hetero- 
geneous results, as well as to identify possible ALOXSAP 
pathogenic variations. 



Acknowledgements 

This work was supported by grants from the Veneto Region, Italian 
Ministry of University and Research (Grant no. 200S/06S1S2), and 
the Cariverona Foundation, Verona, Italy. We wish to thank Mrs 
Maria Zoppi for invaluable secretary help, and Mr Diego Minguzzi for 
technical assistance. The authors declare that they have no potential 
conflict of interests. 



References 

1 Watkins H, Farrall M: Genetic susceptibility to coronary artery 
disease: from promise to progress. Nat Rev Genet 2006; 7: 
163-173. 

2 Lusis AJ, Fogelman AM, Fonarow GC: Genetic basis of athero- 
sclerosis, Part I, New genes and pathways. Circulation 2004; 110: 
1868-1873. 

3 Wang Q: Molecular genetics of coronary artery disease. Curr Opin 
Cardiol 2005; 20: 182-188. 

4 Helgadottir A, Manolescu A, Thorleifsson G et al: The gene 
encoding 5-lipoxygenase activating protein confers risk of 
myocardial infarction and stroke. Nat Genet 2004; 36: 233-239. 

5 Dixon RAF, Diehl RE, Opas E et al: Requirement of a 5- 
lipoxygenase-activating protein for leukotriene synthesis. Nature 
1990; 343: 282-284. 

6 Miller DK, Gillard JW, Vickers PJ etal: Identification and isolation 
of a membrane protein necessary for leukotriene production. 
Nature 1990; 343: 278-281. 



European Journal of Human Genetics 



ALOXSAP gene variants and coronary artery disease 

D Cirelli et al 



966 

7 Spanbroek R, Grabner R, Lotzer K et al: Expanding expression of 
5-lipoxygenase pathway within the arterial wall during human 
atherogenesis. Proc Natl Acad Sci USA 2003; 100: 1238-1243. 

8 Qiu H, Gabrielsen A, Agardh HE et ah Expression of 5- 
lipoxygenase and leukotriene A4 hydrolase in human athero- 
sclerotic lesions correlates with symptoms of plaque instability. 
Proc Natl Acad Sci USA 2006; 103: 8161-8166. 

9 Mehrabian M, Allayee H, Wong J et al: Identification of 5- 
lipoxygenase as a major gene contributing to atherosclerosis 
susceptibility in mice. Circ Res 2002; 91: 120-126. 

10 Zhao L, Funk CD: Lipoxygenase pathways in atherogenesis. 
Trends Cardiovasc Med 2004; 14: 191 - 195. 

11 Ldhmussaar E, Gschwendtner A, Mueller JC etal: ALOXSAP gene 
and the PDE4D gene in a central European population of stroke 
patients. Stroke 2005; 36: 731-736. 

12 Meschia JF, Brott TG, Brown RD et al: Phosphodiesterase 4D and 
5-lipoxygenase activating protein in ischemic stroke. Ann Neurol 
2005;58:351-361. 

13 Zee RY, Cheng S, Hegener HH, Erlich HA, Ridker PM: Genetic 
variants of arachidonate 5-lipoxygenase activating protein and 
risk of incident myocardial infarction and ischemic stroke. Stroke 
2006; 37: 2007-2011. 

14 Girelli D, Russo C, Ferraresi P et al: Polymorphisms in the factor 
VII gene and the risk of myocardial infarction in patients with 
coronary artery disease. N Engl J Med 2000; 343: 774- 780. 

15 Devlin B, Risen N:. A comparison of linkage disequilibrium 
measures for fine-scale mapping. Genomics 1995; 29: 311-322. 

16 Lake SL, Lyon H, Tantisira K et al: Estimation and tests of 
haplotype -environment interaction when linkage phase is 
ambiguous. Hum Hered 2003; 55: 56-65. 

17 Gabriel SB, Schaffner SF, Nguyen H et al: The structure of 
haplotype blocks in the human genome. Science 2002; 296: 
2225-2229. 

Supplementary Information accompanies the paper on European 



18 Kajimoto K, Shioji K, Ishida C et al: Validation of the association 
between the gene encoding 5-lipoxygenase-activating protein 
and myocardial infarction in a Japanese population. Circ / 2005; 
69:1029-1034. 

19 Hansson GK: Inflammation, atherosclerosis, and coronary artery 
disease. N Engl J Med 2005; 352: 1685-1695. 

20 Samuelsson B: Leukotrienes: mediators of immediate hypersensi- 
tivity reactions and inflammation. Science 1983; 220: 568-575. 

21 Helgadottir A, Manolescu A, Helgason A et al: A variant of the 
gene encoding leukotriene A4 hydrolase confers ethnicity- 
specific risk of myocardial infarction. Nat Genet 2006; 38: 68-74. 

22 Dwyer JH, Allayee H, Dwyer KM et al: Arachidonate 5-lipoxigen- 
ase promoter genotype, dietary arachidonic acid and athero- 
sclerosis. N Engl ) Med 2004; 350: 29-37. 

23 Funk CD: Leukotriene modifiers as potential therapeutics for 
cardiovascular disease. Nat Rev Drug Discov 2005; 4: 664-672. 

24 Steinberg D: At last, direct evidence that lipoxygenases play a role 
in atherogenesis. / Clin Invest 1999; 103: 1487-1488. 

25 Lotzer K, Funk CD, Habenicht AJR: The 5-lipoxygenase pathway 
in arterial wall biology and atherosclerosis. Biochim Biophys Acta 
2005; 1736: 30-37. 

26 Lusis AJ: Atherosclerosis. Nature 2000; 407: 233-241. 

27 Lander ES, Schork NJ: Genetic dissection of complex traits. Science 
1994; 265: 2037-2048. 

28 Colhoun HM, McKeigue PM, Davey Smith G: Problems of 
reporting genetic associations with complex outcomes. Lancet 
2003; 361: 865-872. 

29 Helgadottir A, Gretarsdottir S, St Clair D et al: Association 
between the gene encoding 5-lipoxygenase-activating protein 
and stroke replicated in a Scottish population. Am ] Hum Genet 
2005; 76: 505-509. 

Journal of Human Genetics website (http://www.nature.com/ejhg) 



European Journal of Human Genetics 



Genetic Variants of Arachidonate 5-Lipoxygenase- 
Activating Protein, and Risk of Incident Myocardial 
Infarction and Ischemic Stroke 

A Nested Case-Control Approach 

Robert YL. Zee, PhD; Suzanne Cheng, PhD; Hillary H Hegener, BS; Henry A. Erlich, PhD; Paul M Ridker, MD 

Background and Purpose— Recent findings have implicated specific gene polymorphisms of arachidonate 5-lipoxygen- 
ase-activating protein (ALOX5AP), and 2 at-risk haplotypes (HapA, HapB) in myocardial infarction and stroke. To 
date, no prospective data are available. 

Methods— We evaluated 10 specific Icelandic ALOX5AP gene variants among 600 male participants with incident 
atherothrombotic events (myocardial infarction [MI] or ischemic stroke) and among 600 age- and smoking-matched 
male participants, all white, who remained free of reported cardiovascular disease during follow-up within the 
Physicians' Health Study cohort. 

Results— Overall allele, genotype, and haplotype distributions were similar between cases and controls. Single-marker 
conditional logistic regression analysis adjusted for potential risk factors found no association with risk of atherothrombotic 
events. Further investigation using a haplotype-based approach showed similar null findings with MI (HapA: odds ratio 
[OR] =1.1 8, 95% CI, 0.76 to 1.85; P=0A6\ HapB: odds ratio=0.62, 95% CI, 0.36 to 1.07; />=0.08), and with ischemic stroke 
(HapA: odds ratio=l.ll, 95% CI, 0.65 to 1.89; P=0.71; HapB: odds ratio=0.82, 95% CI, 0.47 to 1.42; P=0.47). 

Conclusions— We found no evidence for an association of the specific Icelandic ALOX5P gene variants/at-risk haplotypes 
tested with risk of incident MI nor ischemic stroke in this prospective, non-Icelandic study. (Stroke. 2006;37:2007-2011.) 

Key Words: ALOX5AP ■ haplotypes ■ MI ■ risk factors ■ stroke 



Cardiovascular diseases, including myocardial infarc- 
tion (MI) and ischemic stroke, are the leading causes 
of mortality and morbidity in western countries. The 
underlying pathogenesis is likely to be mediated by both 
genetic and environmental risk factors. The initial report, 1 
in an Icelandic population, of a significant association of 
genetic variants of arachidonate 5-lipoxygenase-activating 
protein (ALOX5AP) with increased risk of MI and stroke 
has attracted great interest. In their study, Helgadottir and 
coauthors reported a linkage and association of a 4-single- 
nucleotide polymorphism (SNP) haplotype, HapA, of 
ALOX5AP gene with risk of MI and stroke. 1 In addition, 
they reported an association of a different 4-SNP haplo- 
type, HapB, with risk of MI in a British population. 1 
Helgadottir and coauthors further assessed the contribution 
of ALOX5AP variants, in particular the HapA, and HapB 
haplotypes, to stroke, in a Scottish population, and found 
that the HapA haplotype confers a relative risk of 1.36 
assuming a multiplicative model (P=0.007) for stroke. 2 
However, they found no association for HapB. Subsequent 



studies by others in several non-Icelandic populations have 
since yielded conflicting results. 3 - 4 

To date, no prospective genetic-epidemiological data are 
available on risk of MI, and ischemic stroke. We therefore 
simultaneously evaluated the role of 10 ALOX5AP (GenelD: 
241; Chromosome: 13ql2) SNPs (SG13S25, SG13S377, 
SG13S106, SG13S114, SG13S89, SG13S30, SG13S32, 
SG13S41, SG13S42, and SG13S35), and specific haplotypes 
thereof, in particular HapA, and HapB at-risk haplotypes, as 
risk determinants of incident MI, and ischemic stroke in a 
prospective, nested case-control sample within the Physi- 
cians' Health Study (PHS) cohort. These polymorphisms 
(except SG13S106, SG13S30, and SG13S42: unpublished 
data from deCODE Genetics) were chosen based on the 
associations observed in the Icelandic study. 1 

Materials and Methods 

Study Design 

We used a nested case-control design within the PHS, 5 a random- 
ized, double-blinded, placebo-controlled trial of aspirin and beta 
carotene initiated in 1982 among 22 071 males, predominandy white 



Received March 31, 2006; accepted May 2, 2006. 

From the Center for Cardiovascular Disease Prevention, the Donald W. Reynolds Center for Cardiovascular Research, the Leducq Center for Molecular 
and Genetic Epidemiology (R.Y.L.Z., H.H.H., P.M.R.), Brigham and Women's Hospital, Harvard Medical School, Boston, Mass; and the Roche 
Molecular Systems (S.C., H.A.E.), Alameda, Calif. 

Correspondence to Dr Robert Y.L. Zee, Laboratory of Genetic and Molecular Epidemiology, Center for Cardiovascular Disease Prevention, Bngham 
and Women's Hospital, 900 Commonwealth Ave East, Boston, MA 02215. E-mail r2ee@rics.bwh.harvard.edu 

© 2006 American Heart Association, Inc. 

Stroke is available at http://www^trokeaha.org DOI: 10.1161/01 .STR.0000229905.25080.01 



Downloaded from stroke.ahajaUMl2ls.org by on October 16, 2007 



2008 Stroke August 2006 



(>94%), US physicians, 40 to 84 years of age at study entry. Before 
randomization, 14916 participants provided an EDTA-anticoagulated 
blood sample and stored for genetic analysis. All participants were free 
of prior Ml, stroke, transient ischemic attacks, and cancer at study 
entry. As the study participants were all US male physicians, yearly 
follow-up self-report questionnaires provide reliable updated infor- 
mation on newly developed diseases and the presence or absence of 
other cardiovascular risk factors. History of cardiovascular risk 
factors, such as hypertension (> 140/90 mm Hg or on antihyperten- 
sive medication), diabetes or hyperlipidemia (>240 mg/dL), was 
defined by self-report of diagnosis at entry into the study. For all 
reported incident vascular events occurring after study enrollment, 
hospital records, death certificates, and autopsy reports were re- 
quested and reviewed by an end-points committee using standardized 
diagnostic criteria. 

The diagnosis of Ml was confirmed by evidence of symptoms in 
the presence of either diagnostic elevations of cardiac enzymes or 
diagnostic changes on electrocardiograms. In the case of fatal events, 
the diagnosis of Ml was also accepted based on autopsy findings. 
Stroke was defined by the presence of a new focal neurological 
deficit, with symptoms and signs persisting for >24 hours, and was 
ascertained from blinded review of medical records, autopsy results 
and the judgment of a board-certified neurologist, on the basis of 
clinical reports, computed tomographic, or MRI scanning. 

For each case (MI or ischemic stroke), a control matched by age, 
smoking history (never, past, or current) and length of follow-up 
were chosen among those subjects who remained free of vascular 
diseases. The present association study consisted of 341 MI case- 
control pairs, and 259 ischemic stroke case-control pairs, all white 
males. 

The study was approved by the Brigham and Women's Hospital 
Institutional Review Board for Human Subjects Research. 

Genotyping Determination 

Genotyping was performed using an immobilized probe approach, as 
previously described (Roche Molecular Systems). 6 In brief, each 
DN A sample was amplified in a multiplex polymerase chain reaction 
using biotinylated primers. Each polymerase chain reaction product 
pool was then hybridized to a panel of sequence-specific oligonu- 
cleotide probes immobilized in a linear array. The colorimetric 
detection method was based on the use of streptavidin-horseradish 
peroxidase conjugate with hydrogen peroxide and 3,3', 5,5'- 
tetramethylbenzidine as substrates. 

To confirm genotype assignment, scoring was carried out by 2 
independent observers. Discordant results (<1% of all scoring) were 
resolved by a joint reading, and where necessary, a repeat genotyp- 
ing. Results were scored blinded as to case-control status. Overall 
completion rate of genotyping determination was >95%. 

Statistical Analysis 

Allele and genotype frequencies among cases and controls were 
compared with values predicted by Hardy-Weinberg equilibrium 
using the x 2 test. Relative risks associated with each genotype were 
calculated separately by conditional logistic regression analysis 
conditioning on the matching by age, smoking status, and length of 
follow-up since randomization, and further controlling for random- 
ized treatment assignment, history of hypertension, presence or 
absence of diabetes, and body mass index, assuming an additive, 
dominant, or recessive mode of inheritance. Pairwise linkage dis- 
equilibrium (LD) was examined as described by Devlin and Risen. 7 
For comparison with published reports by others, we examined 2 
previously described at-risk haplotypes: HapA (SG13S25G- 
SG13S1147-SG13S89G-SG13S32^), and HapB (SG13S377>4- 
SG13S1144-SG13S4M-SG13S35G). Haplotype estimation and in- 
ference was determined using PHASE v2.1. 8 * 9 Haplotype 
distributions between cases and controls were examined by likeli- 
hood ratio test. The relationship between haplotypes and clinical 
outcomes was examined using a haplotype-based logistic regression 
analysis with baseline-parameterization, 10 adjusting for the same risk 
factors. All analyses were carried out using SAS/Genetics 9.1 



TABLE 1 . Baseline Characteristics of Study Participants Who 
Subsequently Developed Any Arterial Event (Cases), and Those 
Who Remained Free of Vascular Disease During Follow-Up 
(Controls) 





Controls 

fn=fift0\ 

\l l uuu/ 


Cases 
m=600) 


P 


Age.y 


60.8±0.3 


bl .UxU.o 


ro.v. 


Smoking status, % 






ro.v. 


Never 


41.7 


AA 7 




Past 


41.5 


41 .5 




Current 


1C D 

lb.o 


1fi ft 




Body mass index, kg/m 2 


24.9 ±0.1 


25.4 ±0.1 


U.UUl 


Blood pressure, mm Hg 








Systolic 


128.6±0.5 


132.7±0.6 


<0.0001 


Diastolic 


79.6±0.3 


81.8±0.3 


<0.0001 


Hyperlipidemia, % 


14.9 


22.8 


<0.001 


Hypertension, % 


29.0 


47.2 


<0.0001 


Diabetes, % 


2.8 


8.9 


<0.0001 


Aspirin use, % 


46.3 


44.8 


0.61 


Family history of premature 


8.9 


10.9 


0,24 


CAD <60 years of age, % 









package (SAS Institute, Inc). For each odds ratio (OR), we calculated 

Downloaded from stroke.ahajoiirnals.org by on October 16, 2007 



Mean±SE unless otherwise stated, 
m.v. indicates matching variable; CAD, coronary artery disease. 
Continuous and categorical variables were tested by paired t test and 
McNemar test, respectively. 

95% CIs. A 2-tailed P value of 0.05 was considered a statistically 
significant result. 

Results 

Baseline characteristics of cases and controls are shown in 
Table 1. As expected, the case participants had a higher 
prevalence of traditional cardiovascular risk factors at base- 
line as compared with controls. The genotype frequencies for 
the polymorphisms tested were in Hardy-Weinberg equilib- 
rium in the control group and in the case group. 

Using a single-marker x* analysis, allele and genotype 
distributions were similar between cases and controls 
(Table 2). Results from the adjusted conditional logistic 
regression analysis, assuming additive, dominant, or reces- 
sive mode of inheritance, showed no significant associa- 
tion of the variants tested with the clinical outcomes 
(P>0.07; data not shown). In general, the polymorphisms 
tested were in LD (supplemental Table I, available online 
at http://stroke.ahajournals.org). The overall haplotype 
distributions between cases and controls were similar (MI: 
HapA region, P=0.79, HapB region, P=0.94; ischemic 
stroke: HapA region, P=0.77, HapB region, P=0.26; 
supplemental Table II, available online at http:// 
stroke.ahajournals.org). The most frequent haplotypes 
were G-T-G-C, and G-T-A-G for HapA region, and HapB 
region, respectively (supplemental Table II), and thus were 
used as the referents. Results from the adjusted haplotype- 
based conditional logistic regression analysis again 
showed similar null findings (supplemental Table III, 
available online at http://stroke.ahajournals.org). 



Zee et al ALOX5AP Gene Variants and Atherothrombosis 



TABLE 2. Genotype and Allele Distribution 



AL0X5AP Genotype, % 



Ml Controls Ml Cases 



IsST Controls 



IsST Cases 



SG13S25 

GG 

GA 

AA 
Allele 

G 

A 

SG13S377 

GG 

GA 

AA 
Allele 

G 

A 

SG13S106 

GG 

GA 

AA 
Allele 

G 

A 

SG13S114 

7T 

TA 

AA 
Allele 

T 

A 

SG13S89 

GG 

GA 

AA 
Allele 

G 

A 

SG13S30 

GG 

GT 

TT 
Allele 

G 

T 

SG13S32 

CC 

CA 

AA 
Allele 

C 

A 



81.31 
18.07 
0.62 

0.90 
0.10 

75.39 
23.05 
1.56 

0.87 
0.13 

50.16 
37.69 
12.15 

0.69 
0.31 

47.04 
41.43 
11.53 

0.68 
0.32 

89.72 
9.66 
0.62 

0.95 
0.05 

58.57 
37.69 
3.74 

0.77 
0.23 

27.73 
52.96 
19.31 

0.54 
0.46 



80.56 
19.14 
0.31 

0.90 
0.10 

78.09 
20.68 
1.23 

0.88 
0.12 

46.60 
41.98 
11.42 

0.68 
0.32 

45.37 
42.28 
12.35 

0.68 
0.32 

88.89 
10.80 
0.31 

0.94 
0.06 

58.95 
36.42 
4.63 

0.77 
0.23 

22.84 
54.63 
22.53 

0.50 
0.50 



0.80 



0.89 



0.71 



0.41 



0.54 



0.59 



0.90 



0.63 



0.76 



0.84 



0.83 



0.91 



0.30 



0.15 



83.13 
15.64 
1.23 

' 0.91 . 
0.09 

70.37 
25.93 
3.70 

0.83 
0.17 

45.27 
44.86 
9.88 

0.68 
0.32 

41.56 
43.62 
14.81 

0.63 
0.37 

89.71 
9.47 
0.82 

0.94 
0.06 

51.85 
41.15 
7.00 

0.72 . 
0.28 

24.28 
47.33 
28.40 

0.48 
0.52 



79.58 
20.00 
0.42 

0.90 
0.10 

75.42 
22.50 
2.08 

0.87 
0.13 

45.00 
40.00 
15.00 

0.65 
0.35 

42.08 
42.50 
15.42 

0.63 
0.37 

89.17 
10.42 
0.42 

0.94 
0.06 

57.92 
36.67 
5.42 

0.76 
0.24 

20.83 
54.17 
25.00 

0.48 
0.52 



0.29 



0.47 



0.35 



0.15 



0.20 



0.38 



0.96 



0.99 



0.80 



0.96 



0.38 



0.17 



0.32 



0.99 



{Continued} 



Downloaded from stroke.ahajournals.org by on October 16, 2007 



2010 Stroke August 2006 



TABLE 2. Continued 



AL0X5AP Genotype, % 


Ml Controls 


Ml Cases 


P 


IsST Controls 


IsST Cases 


P 


SG13S41 






0.50 






0.89 


AA 


82.87 


83.02 




84.36 


85.42 




AG 


15.58 


16.36 




14.40 


13.75 




GG 


1.56 


0.62 




1.23 


0.83 




Allele 






0.73 






0.68 


A 


0.91 


0.91 




0.92 


0.92 




G 


0.09 


0.09 




0.08 


0.08 




SG13S42 






0.17 






0.36 


AA 


28.04 


34.88 




38.68 


35.00 




AG 


50.78 


45.99 




43.62 


50.00 




GG 


21.18 


19.14 




17.70 


15.00 




Allele 






0.11 






0.88 


A 


0.53 


0.58 




0.60 


0.60 




G 


0.47 


0.42 




0.40 


0.40 




SG13S35 






0.08 






0.50 


GG 


81.31 


85.80 




79.42 


83.33 




GA 


18.69 


13.58 




19.75 


16.25 




AA 




0.62 




0.82 


0.42 




Allele 






0.21 






0.26 


G 


0.91 


0.93 




0.89 


0.91 




A 


0.09 


0.07 




0.11 


0.09 





IsST indicates ischemic stroke. 
P value for ^ test. 



association between HapA and an increased risk of ischemic 
stroke (relative risk=1.35; ^=0.02), and an over-representation 
of HapB (relative risk= 1 .65; P=0.02) with ischemic stroke in a 
Scottish male sample population 2 (Table 3). Recently, Lohmus- 
sar and coauthors 3 reported that sequence variants in the 
ALOX5AP gene are significantly associated with stroke, partic- 
ularly in males, in a Central European sample population. A 
nominally significant association with stroke was observed for 
SG13S114 (OR=1.24; />=0.017), and SG13S100 (OR=1.26; 
p=0.024). However, they found no association of HapA with 
stroke risk. 3 More recently, Meschia and coauthors conducted 
the first replication study using a North American sample 

TABLE 3. Summary of AL0X5AP At-Risk-Haplotypes Association Studies __ 



HapA HfPB 





Ml 


Stroke 


Ml 


Stroke 




Conf, Casf, R, P 


Conf, Cast, R, P 


Conf, Cast, R, P 


Conf, Cast, R, P 


Present study United States 


0.14, 0.17, 1.18, 0.46 


0.18, 0.15,1.11,0.71 


0.07, 0.06, 0.62, 0.08 


0.08, 0.07, 0.82, 0.47 


Iceland 1 


0.10, 0.16, 1.80, <0.0001 


0.10, 0.15, 1.67, <0.0001 


Not available 


*0.07, 0.07, 1.09, ns 


United Kingdom 1 


0.15, 0.17, ns 


Not available 


0.04, 0.08, 1.95, 0.00037 


Not available 


Scotland 2 


Not available 


0.14, 0.18, 1.35, 0.02 


Not available 


0.06, 0.09, 1.65, 0.02 


Germany 3 


Not available 


0.15, 0.15, ns 


Not available 


ns (data not shown) 


North America 4 


Not available 


ns (data not shown) 


Not available 


Not available 



Conf indicates haplotype frequency in controls; Casf, haplotype frequency in cases; R, risk estimate; ns, nonsignificance. 
HapA=SG1 3S25G-SG1 3S1 1 4 7-SG1 3S89G-SG1 3S32A HapB= SG1 3S377ASG1 3S1 1 44-SG1 3S41 4-SGt 3S35G. 



*Data extracted from reference 2. 

Downloaded from stroke.ahajournals.org by on October 16, 2007 



Discussion 

The present prospective investigation provides no evidence 
for an association of the specific gene variants, nor at-risk 
haplotypes of the ALOX5AP gene, previously suggested as 
genetic risk determinants, with MI or stroke in a non- 
Icelandic white population. 

In the initial Icelandic report, 1 a 4-SNP haplotype (HapA) was 
found to be associated with a 2X greater risk of MI, and an 
almost 2X greater risk of stroke. The same group also reported 
an association of a different 4-SNP ALOX5AP haplotype 
(HapB) with risk of MI in a British sample population 1 (Table 
3). A subsequent report by Helgadottir and coauthors found an 



Zee et al ALOX5AP Gene Variants and Atherothrombosis 2011 



population, and found no association between ALOX5AP gene 
variants and stroke, although MI was not investigated in their 
study. 

Given this situation, a possible explanation for the apparent 
discrepancies is that the observed allele, genotype, and at-risk 
haplotype frequencies for the SNPs examined may differ 
between studies, which could be the result of population/ 
ethnic differences. As previously suggested, 3 - 4 the ALOX5AP 
gene variation may play a substantial role in risk of MI, and 
stroke in Iceland (an isolate population), but a lesser role in 
non-Icelandic populations because of different population LD 
structures. These recent results are consistent with the initial 
report that different at-risk haplotypes were found between 
the Icelandic and British study populations. 1 

As shown in Table, 3, not all of the published reports 
examined the same set of SNPs, nor did all of the reported 
studies examine the association of ALOX5AP variants with 
MI and stroke simultaneously. Further, not all published 
studies presented information on allele, genotype and at-risk 
haplotype frequencies, LD structure, and risk estimates, thus 
making a direct comparison and informative interpretation 
across studies difficult. 

It has been noted in the initial report 1 that variants of 
ALOX5AP gene are involved in the pathophysiology of MI 
and stroke by increasing the production of leukotriene B4, a 
critical regulator in the 5-lipoxygenase pathway, and a 
proinflammatory agent. Leukotrienes are arachidonic acid 
metabolites, which have been implicated in various inflam- 
matory conditions, including asthma, arthritis, psoriasis, and 
atherosclerosis. 1112 Notably, a recent article by the same 
Icelandic group found a haplotype (HapK) of the gene 
encoding leukotriene A4 hydrolase, a protein in the same 
biochemical pathway of ALOX5AP, confers ethnicity- 
specific (particularly in blacks) risk of MI. 13 

The prospective nature of the PHS study and the use of a 
closed population sampling scheme in which subsequent case 
status was determined solely by the development of disease 
strongly reduce the possibility that our findings are attributable 
to bias or confounding. Our study cohort consists of entirely 
white males with distinct socioeconomic status (physicians), so 
our data cannot be generalized to other ethnic groups and 
women. In our study, we had the ability to detect, based on the 
present sample sizes, assuming 80% power, at an a of 0.05, a 
risk ratio of > 1.54 (MI), and 1.64 (ischemic stroke) if the minor 
allele frequency is 0.50, and of >2.26 (MI), and 2.49 (ischemic 
stroke) if the minor allele frequency is 0.05 assuming a univari- 
able-additive mode. Thus, we cannot rule out a modest risk of 
cardiovascular disease associated with the polymorphisms/hap- 
lotypes tested. It is important to recognize that association 
studies like this one can only examine the possible association 
between phenotype and the tested polymorphisms. Our study 
therefore cannot exclude the possibility that examination of 
different polymorphisms/loci, which would by definition have to 
be in linkage disequilibrium with the ones tested, might obtain 
different results. 

In conclusion, our prospective study found no evidence for 
an association of specific Icelandic ALOX5AP gene poly- 
morph sms/at-risk haplotypes examined with risk of athero- 
thrombotic events. If corroborated in other non-Icelandic 



prospective studies, our data suggest that ALOX5AP gene 
variation is not informative for risk assessment of athero- 
thrombosis in non-Icelandic populations. 

Acknowledgments 

We thank Michael Grow and Houman Khakpbur (both at Roche 
Molecular Systems) for their expertise and efforts in developing the 
genotyping reagents used for this study. We thank Anna Helgadottir 
at deCODE Qenetics for sharing unpublished data. 

Sources of Funding 

This work was supported by grants from the National Heart Lung 
and Blood Institute (HL-58755 and HL-63293), the Doris Duke 
Charitable Foundation, the American Heart Association, and the 
Donald W. Reynolds Foundation, Las Vegas, Nevada. 

Disclosures 

None. 

References 

1. Helgadottir A, Manolescu A, Thorleifsson G, Gretarsdottir S, Jonsdottir 
H, Thorsteinsdottir U, Samani NJ, Gudmundsson G, Grant SF, 
Thorgeirsson G, Sveinbjomsdottir S, Valdimarsson EM, Matthiasson SE, 
Johannsson H, Gudmundsdottir O, Gumey ME, Sainz J, Thorhallsdottir 
M, Andresdottir M, Frigge ML, Topol EJ, Kong A, Gudnason V, 
Hakonarson H, Gulcher JR, Stefansson K. The gene encoding 5-lipoxy- 
genase activating protein confers risk of myocardial infarction and stroke. 
Nat Genet. 2004;36:233-239. 

2. Helgadottir A, Gretarsdottir S, St Clair D, Manolescu A, Cheung J, Thorleifsson 
G, Pasdar A, Grant SF, Whalley LJ, Hakonarson H, Thorsteinsdottir U, Kong 
A, Gulcher J, Stefansson K, MacLeod MJ. Association between the gene 
encoding 5-lipoxygenase-activating protein and stroke replicated in a Scottish 
population. Am J Hum Genet. 2005;76:505-509. 

3. Lohmussaar E, Gschwendtner A, Mueller JC, Org T, Wichmann E, 
Hamann G, Meitinger T, Dichgans M. AIOX5AP gene and the PDE4D 
gene in a central European population of stroke patients. Stroke. 2005; 
36:731-736. 

4. Meschia JF, Brott TG, Brown RD Jr, Crook R, Worrall BB, Kissela B, 
Brown WM, Rich SS, Case LD, Evans EW, Hague S, Singleton A, Hardy 
J. Phosphodiesterase 4d and 5-lipoxygenase activating protein in ischemic 
stroke. Ann Neurol. 2005;58:351-361. 

5. Physician's health study. Aspirin and primary prevention of coronary 
heart disease. N Engl J Med. 1989;321:1825-1828. 

6. Cheng S, Grow MA, Pallaud C, Klitz W, Erlich HA, Visvikis S, Chen JJ, 
Pullinger CR, Malloy MJ, Siest G, Kane JP. A multilocus genotyping 
assay for candidate markers of cardiovascular disease risk. Genome Res. 
1999;9:936-949. 

7. Devlin B, Risch N: A comparison of linkage disequilibrium measures for 
fine-scale mapping. Genomics. 1995;29:311-322. 

8. Stephens M, Smith NJ, Donnelly P. A new statistical method for haplotype 
reconstruction from population data Am J Hum Genet. 2001;68:978-989. 

9. Stephens M, Donnelly P. A comparison of Bayesian methods for hap- 
lotype reconstruction from population genotype data. Am J Hum Genet. 
2003;73:1162-1169. 

10. Wallenstein S, Hodge SE, Weston A. Logistic regression model for 
analyzing extended haplotype data. Genet Epidemiol. 1998;15:173-181. 

11. Samuelsson B. Leukotrienes: mediators of immediate hypersensitivity 
reactions and inflammation. Science. 1983;220:568-575. 

12. Spanbroek R, Grabner R, Lotzer K, Hildner M, Urbach A, Ruhling K, 
Moos MP, Kaiser B, Cohnert TU, Wahlers T, Zieske A, Plenz G. 
Robenek H, Salbach P, Kuhn H, Radmark O, Samuelsson B, Habenicht 
A J. Expanding expression of the 5-lipoxygenase pathway within the 
arterial wall during human atherogenesis. Proc Natl Acad Sci USA. 
2003;100:1238-1243. 

13. Helgadottir A, Manolescu A, Helgason A, Thorleifsson G, Thorsteins- 
dottir U, Gudbjartsson DF, Gretarsdottir S, Magnusson KP, 
Gudmundsson G, Hicks A, Jonsson T, Grant SF, Sainz J, O'Brien SJ, 
Sveinbjomsdottir S, Valdimarsson EM, Matthiasson SE, Levey Al, 
Abramson JL, Reilly MP, Vaccarino V, Wolfe ML, Gudnason V, 
Quyyumi A A, Topol EJ, Rader DJ, Thorgeirsson G, Gulcher JR, 
Hakonarson H, Kong A, Stefansson K. A variant of the gene encoding 
leukotriene a4 hydrolase confers ethnicity-specific risk of myocardial 
infarction. Nat Genet. 2006;38:68-74. 



Downloaded from stroke.ahajournals.org by on October 16, 2007 



February 2007 • Vol. 9 * No. 2 



No association of polymorphisms in the gene 
encoding 5-lipoxygenase-activating protein and 
myocardial infarction in a large central European 
population 

Werner Koch, PhD 1 , Petra Hoppmann, MP 1 , Jakob C Mueller, PhD 2 , Albert Schdmig, MP 1 , and Adrian Kastrat i, MP 1 
Purpose: Haplotypes based on polymorphisms in the gene encoding 5Jipoxygenase-activating protein have been 
linked with susceptibility to myocardial infarction in Iceland and the United Kingdom. We sought to replicate these 
association findings in a large case-control sample from Germany. Methods: The case group included 3657 
patients with myocardial infarction and the control group comprised 1211 individuals with angiographically normal 
coronary arteries and without clinical signs or symptoms of myocardial infarction. Nine different polymorphisms 
were genotyped with the use of the TaqMan technique. Results: Genotype, allele, and haplotype analyses did not 
reveal significant associations between the polymorphisms and myocardial infarction. The negative results 
included a four-marker haplotype, termed HapA haplotype (odds ratio = 1.10; 95% confidence interval: 0.96- 
1.25), that was previously found to be related with myocardial infarction in a sample from Iceland, and a different 
four-marker haplotype, termed HapB haplotype (odds ratio = 0.94; 95% CI: 0.79-1.12), that was previously linked with 
myocardial infarction in a sample from the United Kingdom. Ninennarker haplotypes were not significantly associated 
with myocardial infarction in multiple logistic regression models adjusted for covariates (P> 0.38). Conclusion: In this 
sample from central Europe, specific polymorphisms in the gene for 5-lipoxygenase-actiyating protein were not 
associated with myocardial infarction, a result contrasting previous positive findings. Genet Med 2007:9(2):123-129. 
Key Words: ALOX5AP, 5-lipoxygenase-activating protein (RAP) t genetics, haplotype, myocardial infarction 



Specific allelic forms of the gene encoding 5-lipoxygenase- 
activating protein (FLAP) have been linked with susceptibility 
to myocardial infarction (MI) and stroke. 1 - 4 These association 
findings may reflect a possible relationship of the regulatory 
function of FLAP in the inflammatory 5-lipoxygenase pathway 
and the important role attributed to inflammatory processes in 
atherosclerotic diseases. 5 " 9 The 5-lipoxygenase cascade leads to 
the formation of leukotrienes, which exhibit strong proinflam- 
matory activities in cardiovascular tissues. 9 " 11 This pathway is 
especially active in arterial walls of patients afflicted with vari- 
ous lesion stages of atherosclerosis of the aorta and of coronary 
and carotid arteries. 10 



From the 1 Deursches Herzzcntrum Miinchen and ). Medizinischc Klinik, Kliuikum rectos 
der liar, Tecknische Vniversitiit Miinchen, Munich, Germany: 2 Max Planck Institute for 
Ornithology, Department of Behavioural Ecology and Evolutionary Genetics, Starnbcrg 
(Seewiesen), Germany. 
The authors declare no conflict of interest. 

Werner Koch PhD, German Heart Center Munich, Uuuiremtrassc 36', 80636 Miinchen, 
Germany. E-mail: wkoch@dhm.mhn.de 

A supplementary Appendix is nvaiUible via the ArridePhts feature at m^v.gencHcsinmedicine. 
org. Please go to the February issue and click on the ArridePhts link posted with the article in 
the Table of Contents to view this material 
Submitted for publication September 27, 2006. 
Accepted for publication December 5, 2QQ6. 
DOi: W.J097/GlM.0b013e3W030c9c5 



The gene for FLAP {ALOX5AP) contains five exons 
and spans approximately 31 kb in the chromosome 13ql2 
region. 12 - 13 Specific single nucleotide polymorphisms (SNPs), 
named SG13S100, SG13S106, SG13S114, and a four-marker 
haplotype of ALOX5AP, termed HapA haplotype (SG 1 3S25-G, 
SG13S114-T, SG13S89-G, SG13S32-A), were found to be re- 
lated to MI in a population sample from Iceland. 1 A different 
four-marker haplotype of ALOX5AP, termed HapB haplotype 
(SGI 3S377-A, SG13S1 14-A, SG13S41-A, SG13S35-G), but not 
the HapA haplotype, was linked with MI in a sample from the 
United Kingdom (UK). 1 No evidence of an association of the 
HapA or HapB haplotype with MI was obtained in a sample of 
white male physicians from the United States (US). 14 

We examined whether the nine different SNPs mentioned 
above, nine-marker haplotypes of these SNPs, and the HapA 
and HapB haplotypes were associated with MI in a German 
population. The sample consisted of 3657 patients with MI and 
1211 control individuals, all of whom were assessed with cor- 
onary angiography. 

METHODS 

Patients and controls 

Participants were recruited from Southern Germany and ex- 
amined at Deutsches Herzzentrum Miinchen or 1. Medizinische 



Genetics IN Medicine 

Copyright © American Cofiage erf Medical Genetics, Unauthorized reproduction of this article ;s prohibited. . 



Koch et al. 



Klinik rechts der Isar der Technischen Universitat Munchen from 
1 993 to 2002. After catheterization, 5264 individuals were deemed 
eligible for inclusion in the MI or control group. Written in- 
formed consent for genetic analysis was obtained from 97.1% 
(n = 5111) of these individuals. In no case was consent with- 
drawn. Blood samples assigned for DNA preparation had been 
collected from 95.2% (n = 4868) of the individuals who agreed to 
participate in the study. These individuals, 3657 patients with MI 
and 1211 controls, constituted the study population. Complete 
genotype data were obtained from all these patients and control 
individuals. The study protocol was approved by the Institutional 
Ethics committee and the reported investigations were in accor- 
dance with the principles of the Declaration of Helsinki. 15 

Definitions 

Individuals were considered disease free and, therefore, eli- 
gible as controls when their coronary arteries were angio- 
graphically normal and when they had no history of MI, no 
symptoms suggestive of MI, no electrocardiographic signs of 
MI, and no regional wall motion abnormalities. Coronary an- 
giography in the control individuals was performed for the 
evaluation of chest pain. The diagnosis of MI was established in 
the presence of chest pain lasting longer than 20 minutes com- 
bined with ST-segment elevation or pathologic Q waves on a 
surface electrocardiogram. Patients with MI had to show either 



an angiographically occluded infarct-related artery or regional 
wall motion abnormalities corresponding to the electrocardio- 
graphic infarct localization, or both. Systemic arterial hyper- 
tension was defined as a systolic blood pressure of ^140 mm 
Hg and/or a diastolic blood pressure of ^90 mm Hg, 16 on at 
least two separate occasions or antihypertensive treatment. 
Hypercholesterolemia was defined as a documented total cho- 
lesterol value ^240 mg/dL (>6.2 mmol/L) or current treat- 
ment with cholesterol-lowering medication. Persons reporting 
regular smoking in the previous 6 months were considered as 
current smokers. Diabetes mellitus was defined as the presence 
of an active treatment with insulin or an oral antidiabetic 
agent; for patients on dietary treatment, documentation of an 
abnormal fasting blood glucose or glucose tolerance test based 
on the World Health Organization criteria 17 was required for 
establishing this diagnosis. 

Genetic analysis 

Genomic DNA was extracted from peripheral blood leuko- 
cytes with the QIAamp DNA Blood Kit (Qiagen, Hilden, Ger- 
many) or the High Pure PCR Template Preparation Kit (Roche 
Applied Science, Mannheim, Germany). We designed and 
used TaqMan allelic discrimination assays for genotype analy- 
sis of nine SNPs in ALOX5AP (Table 1). Primers and probes 
(Table 1 ) were synthesized by Applied Biosystems (Darmstadt, 



Table 1 



deCODE 
SNP ID" 


NCB1 
dbSNP ID* 


SNP 
bases 


Position in 
AL512642 C 


Location 


Primer (5' -*3') 


Probe (5'-* 3')' 


SG13S25 




G>A 


26663 


Upstream of 
exoh 1 


TCTGACAGCATCAGCTAGTCTCTTTC 
AAATTCATGTTGCTGTGTCCATACA 


FAM-CACTGTTGCCCAGTGG 
V1C-AGCCACTGTTACCCAGT 


SG13S377 




G>A 


31075 


Upstream of 
exon 1 


TTTGGCCAGACTGTCTTGAACTC 
TGGCTCATGCCTATAATCACAAAA 


FAM-CCTGCCTCGGCCT 
VIC-CTGCCrCAGCCTC 


SG13S100 


rs4073259 


A>G 


33381 


Upstream of 
exon 1 


GGTGAAGTGGACTCCCTCCAT 
CCCCGCTCTGAGCTCCTT 


FAM-AGCCAGCGCGCAG 
V1C-CAGCCAGIGCGCAG 


SG13S106 


rs9579646 


G>A 


37689 


Intron 1 


TGTGTAGAGCTGTCTTCCTAAAGTTCTG 
AAGCCACTGGAGATAGTTATGAAAGTG 


FAM-AGTTAGGGCTGCCTC 
V1C-AGTTAGGACTGCCTCAG 


SG13S114 


rs 10507391 


T>A 


39206 


Intron 1 


CCAGATGTATGTCCAAGCCTCTCT 
CTCTGTAAGGTAGGTCTATGGTTGCAA 


F AM -TGCAATTCTAATTAACCTC 
VIC-TGCAATTCTATTTAACCTC 


SG13S89 


rs4769874 


G>A 


53551 


Intron 3 


TCGGGAGGCCGTGTTTC 
CCAGGGAGCAAGCATTAGCA 


FAM-ATTATCACACGCGCTCT 
VIC-TATCACATGCGCTCTG 


SG13S32 


rs9551963 


A>C 


59657 


Intron 4 


CTGCTTTAGTTCTTGACCTCACCAA 
CTGGGGTTCAAGAGAGAAATTCC 


FAM - AAGG ATCTCATCT AGCAAT 
VIC-AAGGATCTCATCGAGCAA 


SG13S41 


rs93 15050 


A>G 


63155 


Intron 4 


CCTGTCTCCAAATACAGTCCCATT 
AGGTCCCTTCCAAAATTCATATGTT 


F AM - ATCTTT ACTCTCAGTTCCT 
V1C-TCTTTACCCTCAGTTCC 


SG13S35 




G>A 


67227 


Downstream 
of exon 5 


CCTGGCATTGAGGAGTTTTCC 
ACCCCACAAATACCTACAAATATGTGTAT 


FAM-TAAAAAACCGAAAGGAC 
VIC-TTAAAAAACTGAAAGGACC 



"Helgadottir et al. 1 

^NCBI SNP database (http://www.ncbi.nlm.nih.gov/entrez/); last accessed September 27, 2006. 
'NCBI nucleotide database (http://www.ncbi.nlm.nih.gov/entrez/); sequence version of May 18, 2005. 

rf FAM (6-carboxy-fluorescein).or VIC (proprietary dye of Applied Biosystems) was attached to the 5 ends of me probe oligonucleotides. The sequences of the P r °kes 
used for analysisof the SG13S25, SG13S377, SG13S106, and SG13S1 14 SNPs corresponded to the coding strand and the sequences of the probes used for analysis of 
the SG13S100, SG13S89, SG13S32, SG13S41, and SG13S35 SNPs corresponded to the noncoding strand; the allele -specific nucleotide in each probe sequence is 
underlined. 

SNPs, single nucleotide polymorphisms. 



124 

Copyright & American College of Medical .Genetics. Unauthorized repro 



Genetics IN Medicine 

duction q\ this article is prohibited. 



AL0X5AP polymorphisms and risk of Ml 



Table 2 

Baseline characteristics of the control group and the MI group 





Control group 


MI group 




Characteristic 


(n= 1211) 


(n = 3657) 


P 


Age,yr 


60.3 ± 11.9 


64.0 ± 12.0 


<0.0001 


Women 


598 (49.4) 


885 (24.2) 


<0.0001 


Arterial hypertension 


589 (48.6) 


2246 (61.4) 


<0.0001 


Hypercholesterolemia 


602 (49.7) 


2067 (56.5) 


<0.0001 


Current cigarette smoking 


184 (15.2) 


1849 (50.6) 


<0.0001 


Diabetes mellitus 


65 (5.4) 


754 (20.6) 


<0.0001 



Age is mean ± SD; other variables are presented as number (%). 
Ml, myocardial infarction. 



Germany). To accomplish allele-specific signaling, the probes 
contained the fluorogenic dyes 6-carboxy-fluorescein (FAM) 
or VIC. (proprietary dye of Applied Biosystems) . Minor groove 
binder groups were conjugated with the 3' ends of the oligo- 
nucleotides to facilitate formation of stable duplexes between 
the probes and their single- stranded DNA targets. 18 Approxi- 
mately 20% of the DNA samples were retyped with each Taq- 
Man system' to control for correct sample handling and data 
acquisition. The results of these repeat assays were in full agree- 
ment with the original genotyping results. 

Analyses of PCR products with allele-discriminating restric- 
tion enzymes and/or DNA sequencing were used to verify the 
accuracy of TaqMan genotyping. We employed the restriction 
enzymes Bgll (SG13S25 SNP), Haelll (SG13S377 SNP), Nsbl 
(SG13S100 SNP), Satl (SG13S106 SNP), TasI (SG13S114 



Table 3 



Genotype distributions and allele frequencies of ALOX5AP SNPs in the control group and the MI group 



deCODE SNP ID 


Genotype 


(1211 genotypes) 


MI group 
(3657 genotypes) 


P 


Allele 


Control group 
(2422 alleles) 


MI group 
(7314 alleles) 


P 


SG13S25 


GG 


963 (79.5) 


2949 (80.6) 


0.54 


G 


2162 (89.3) 


6579 (90.0) 


0.33 




GA 


236(19.5) 


681 (18.6) 




A 


260(10.7) 


735(10.0) 






AA 


12 (1.0) 


27 (0.7) 












SG13S377 


GG 


861 (71.1) 


2714(74.2) 


0.053 


G 


2047 (84.5) 


6285 (85.9) 


0.086 




GA 


325 (26.8) 


857 (23.4) 




A 


375 (15.5) 


1029(14.1) 






AA 


25 (2.1) 


86(2.4) 












SG13S100 


AA 


494 (40.8) 


1461 (40.0) 


0.11 


A 


1521 (62.8) 


4636 (63.4) 


0.60 




AG 


533 (44.0) 


1714 (46.9) 




G 


901 (37.2) 


2678 (36.6) 






GG 


184(15.2) 


482 (13.2) 












SG13S106 


GG 


568 (46.9) 


1697 (46.4) 


0,27 


G 


1644(67.9) 


4998 (68.3) 


0.68 




GA 


508 (41.9) 


1604 (43.9) 




A 


778 (32.1) 


2316 (31.7) 






AA 


135(11.1) 


356 (9.7). 












SG13S114 


TT 


526 (43.4) 


1591 (43.5) 


0.40 


T 


1586 (65.5) 


4842 (66.2) 


0.52 


TA 


534 (44.1) 


1660 (45.4) 




A 


836 (34.5) 


2472 (33.8) 






AA 


151(12.5) 


406(11.1) 












SG13S89 


GG 


1093 (90.3) 


3332 (91.1) 


0.60 


G 


2301 (95.0) 


6983 (95.5) 


0.34 




GA 


115(9.5) 


319(8.7) 




A 


121 (5.0) 


331 (4.5) 






AA 


3 (0.2) 


6 (0.2) 












SG13S32 


AA 


301 (24.9) 


924 (25.3) 


0.39 


A 


1224(50.5) 


3650 (49.9) 


0.59 




AC 


622 (51.4) 


1802 (49.3) 




C 


1198 (49.5) 


3664 (50.1) 






CC 


288 (23.8) 


931 (25.5) . 












SG13S41 


AA 


1047 (86.5) 


3166 (86.6) 


0.96 


A 


2253 (93.0) 


6810(93.1) 


0.88 




AG 


159(13.1) 


478 (13.1) 




G 


169 (7.0) 


504(6.9) 






GG 


5 (0.4) 


13 (0.4) 












SG13S35 


GG 


977 (80.7) 


3025 (82.7) 


0.24 


G 


2172 (89.7) 


6645 (90.9) 


0.086 




GA 


218(18.0) 


. 595 (16.3) 




A 


250 (10.3) . 


669 (9.1) 






AA 


16(1.3) 


37(1.0) 













Variables are presented as number (%) of genotypes or alleles in control individuals and myocardial 
SNPs, single nucleotide polymorphisms. 



infarction patients. 



February 2007 • Vol. 9 ■ No. 2 



125 



Co 



CoUeoe of Medic si 6 



eristics, Unauthorized reproduction or this article is prohibited. 



Koch et al. 



SNP), Xcel (SG13S89 SNP), TaqI (SG13S32 SNP), and BslLI 
(SG13S41 SNP) (MBI Fermentas). DNA sequencing was used 
to test whether one or more additional polymorphisms were 
present in the probe-binding section of the amplicons, because 
they may interfere with TaqMan reactions and result in wrong 
genotype assignments. With each SNP, 100 DNA samples were 
examined by sequencing. The known SNPs were identified as 
the only sequence variabilities in the probe-binding regions. 
Thus the probability of genotyping errors due to possible fur- 
ther sequence variations was relatively low. 

Clinicians responsible for diagnosis were not aware of the 
genetic data. All genetic analyses were blinded. 



Statistical analysis 

The analysis consisted of comparisons of genotype, allele, and 
haplotype frequencies between the control group and the group 
of patients with ML Because stronger associations of the HapA 
haplotype with MI were observed in men compared to women in 
both the Iceland and UK studies, 1 we also conducted separate 
analyses of SNP genotype distributions and HapA and HapB hap- 
lotype frequencies in the groups of men and women. Discrete 
variables are expressed as counts (%) and compared using the x 2 
test. Continuous variables are expressed as mean ± SD and com- 
pared by means of the unpaired, two-sided t test Haplotypes were 
reconstructed from genotype data with the use of the software 



Table 4 

Genotype distributions o(ALOX5AP SNPs in the women and men of the control group and the MI group 



Women 



Men 



deCODE SNP ID 


Genotype 


Control group 
(n = 598) 


MI croup 
(n = 885) 


P 


Control group 
(n = 613) 


MI group 
(n = 2772) 


P 


SG13S25 


GG 


465 (77.8%) 


716 (80.9%) 


0.13 


498(81.2%) 


2233 (80.6%) 


0.87 




GA 


127 (21.2%) 


166(18.8%) 




109(17.8%) 


515(18.6%) 






AA 


6(1.0%) 


3 (0.3%) 




6(1.0%) 


24(0.9%) 




SG13S377 


GG 


' 429(71.7%) 


659 (74.5%) 


0.37 . 


432 (70.5%) 


2055(74.1%) 


0.13 




GA 


157(26.3%) 


205 (23.2%) 




168(27.4%) 


652 (23.5%) - 






AA 


12 (2.0%) 


21 (2.4%) 




13 (2.1%) 


65 (2.3%) 


• 


SG13S100 


AA 


255 (42.6%) 


372 (42.0%) 


0.64 


239 (39.0%) 


1089 (39.3%) 


0.10 




AG 


261 (43.6%) 


404 (45.6%) 




272 (44.4%) 


1310(47.3%) 






GG 


82(13.7%) 


109(12.3%) 




102 (16.6%) 


373 (13.5%) 




SG13S106 


GG 


291 (48.7%) 


431 (48.7%) 


0.66 


277 (45.2%) 


1266(45.7%) 


0.27 




GA 


247(41.3%) • 


377 (42.6%) 




261 (42.6%) 


1227(44.3%) 






AA 


60(10.0%) 


77 (8.7%) 




75(12.2%) 


279(10.1%) 




SG13S114 


TT 


270 (45.2%) 


403 (45.5%) 


0.65 


256 (41.8%) 


1188(42.9%) 


0.53 




TA 


256 (42.8%) 


389 (44.0%) 




278 (45.4%) 


1271 (45.9%) 






AA 


72(12.0%) 


93(10.5%) 




79(12.9%) 


313(11.3%) 




SG13S89 


GG 


545 (91.1%) 


813(91.9%) 


0.84 


548 (89.4%) 


2519 (90.9%) 


0.37 




GA 


52 (8.7%) 


70 (7.9%) 




63(10.3%) 


249 (9.0%) 






AA 


1 (0.2%) . 


2(0.2%) 




2(0.3%) 


4 (0.1%) 




SG13S32 


AA 


147 (24.6%) 


221 (25.0%) 


0.51 


154 (25.1%) 


703 (25.4%) 


0.89 




AC 


315(52.7%) 


442 (49.9%) 




307 (50.1%) 


1360(49.1%) • 






CC 


136 (22.7%) 


222 (25.1%) 




152 (24.8%) 


709 (25.6%) 




SG13S41 


AA 


524 (87.6%) 


773 (87.3%) 


0.99 


523 (85.3%) 


2393 (86.3%) 


0.75 




AG 


72(12.0%) 


109(12.3%) 




87(14.2%) 


369(13.3%) 






GG 


2 (0.3%) 


3 (0.3%) 




3 (0.5%) 


10 (0.4%) 




SG13S35 


GG 


472 (78.9%) 


722 (81.6%) 


0.19 


505 (82^4%) 


2303 (83.1%) 


0.59 




GA 


114(19.1%) 


154(17.4%) 




104(17.0%) 


441 (15.9%) 






AA 


12 (2.0%) 


9(1.0%) 




4 (0.7%) 


28(1.0%) 





Variables are presented as number (%) of genotypes in control individuals and MI patients. 
SNPs, single nucleotide polymorphisms; MI, myocardial infarction. 



126 



Copvrinht (§) American College of Medical Genetics. Unauthorized reproduction o: th 



Genetics IN Medicine 

is article is prohibited* 



AL0X5AP polymorphisms and risk of Ml 



package PHASE. 19 We tested for the independent association ef- 
fect of nine-marker haplotypes in multiple logistic regression 
models of MI that included as covariates age, gender, history of 
arterial hypertension, history of hypercholesterolemia, current 
cigarette smoking, and diabetes mellitus. Adjusted odds ratios and 
95% Wald confidence intervals were calculated based on these 
models. Statistical significance was set at P < 0.05. 

RESULTS 

The main baseline characteristics of the control group (n = 121 1) 
and the group of patients with MI (n = 3657) are shown in Table 2. 
Mean age of the MI patients was higher than that of the control 
group; the proportion of women was lower in the patient group 
than in the control group; and history of arterial hypertension and 
hypercholesterolemia, current cigarette smoking, and diabetes 
mellitus were more prevalent in the MI patient group than in the 
control group (P < 0.0001 for all comparisons; Table 2). 

Genotype distributions and allele frequencies of the ALOX5AP 
SNPs were not significantly different between the control group 
and patient group (Table 3). Significant sex-related differences of 
the genotype distributions were not found (Table 4). 

Figure 1 shows the linkage disequilibrium (LD) block struc- 
ture defined by the nine genotyped SNPs. Strong LD was 



ALOX5AP 

Exonl Exon2 Exon 3 Exon4 ExonS 
_| 1 1 1 fr- 



iable 5 




Fig. 1. Genetic diversity at the ALOX5AP genomic region located in the long arm of 
chromosome 13 (band q 12). The exon-intron structure was adapted from sequence data 
deposited in the NCB1 nucleotide database (http://www.ncbi.nlm.nih.gov/entrez/) under 
accession number AL5 1 2642, version of May 1 8, 2005. The values within squares show the 
pairwise correlations between single nucleotide polymorphisms (SNPs) (measured as D') 
defined at the top left and top right sides of the squares. Squares without a number 
indicate D' = 1.00. SNP designations: API = SG13S25, AP2 = SG13S377, AP3 « 
SG13S100, AP4 = SG13S106, APS - SG13S114, AP6 = SG13S89, AP7 = SG13S32, AP8 - 
SG13S41,AP9 = SG13S35. 





Control group 


MI group 


Haplotype 


(2422 haplotypes) 


• (7314 haplotypes) P 


HapA' 


359 (14.8) 


1171(16.0) 0.16 


HapB 


182(7.5) 


518(7.1) 0.48 



Haplotype frequencies are presented as number (%). The HapA haplotype is 
defined by the'alleles SG13S25-G; SG13S1 14-T, SG13S89-G, and SG13S32-A, 
and the HapB haplotype is defined by the alleles SG13S377-A, SG13S114-A, 
SG13S41-A, and SG13S35-G (Helgadottir et al. 1 ). 
MI, myocardial infarction. 



present across the ALOX5AP region (Fig. 1). Frequencies of 
the HapA (SG13S25-G, SG13S114-T, SG13S89-G, SG13S32-A) 
and HapB (SG13S377-A, SG13S114-A, SG13S41-A, SG13S35-G) 
haplotypes were not substantially different between the control 
group and the patient group (Table 5). Risk estimates were 1.10 
(95% CI: 0.96-1.25) for the HapA haplotype and 0.94 (95% 
CI: 0.79-1 .12) for the HapB haplotype. Haplotypes defined by 
nine SNPs were not present at significantly different propor- 
tions among the control individuals and patients, with the ex- 
ception of the Hap5 haplotype, which showed a moderately 
higher frequency in the control group than in the patient group 
(Table 6). 

The frequencies of the HapA, HapB, and nine-marker hap- 
lotypes were not significantly different between the women of 
the control and MI groups and between the men of the two 
groups. In addition, we did not observe significant differences 
in age or sex between the carriers and noncarriers of specific 
haplotypes in the control group. 

To assess whether independent associations existed between 
nine-marker haplotypes and MI, we performed a multivariate 
logistic regression analysis. After adjustments were made for 
conventional cardiovascular risk markers (age, gender, history 
of arterial hypertension, history of hypercholesterolemia, cur- 



Table 6 

Frequencies of nine-marker haplotypes in the control and MI groups 



Name 


Haplotype 

Allele combination 


Control group 
(2422 haplotypes) 


MI group 
(7314 haplotypes) 


P 


Hapl 


GGAGTGCAG 


928 (38.3) 


2805 (38.4) 


0.98 


Hap2 


GGAGTGAAG 


237 (9.8) 


808(11.0) 


0.082 


Hap3 


AGAGTGAAG 


253 (10.4) 


705 (9.6) 


0.25 


Hap4 


GGGAAGAAG 


204 (8.4) 


665 (9.1) 


0.32 


Hap5 


GAGAAGAAA 


188 (7.8) 


479 (6.5) 


0.041 




Other 


612 (25.3) 


1852 (25.3) 


0.96 



Haplotype frequencies are presented as number (%). Shown are results ob- 
tained from the five most frequent nine-marker haplotypes and the combined 
other nine-marker haplotypes. Each haplotype is defined as a specific allele 
combination based on nine single nucleotide polymorphisms (SNPs) in 
ALOX5AP. The order of the SNPs is as follows (from left to right): SG13S25, 
SG13S377, SG13S100, SG13S106, SG13S114, SG13S89, SG13S32, SG13S41, 
SG13S35. Overall P = 0.12. See Table 1 and Figure 1 for the locations of the 
SNPs in the ALOX5AP genomic region. 
Ml, myocardial infarction. 



February 2007 • Vol. 9 • No. 2 

Copyright ® American Cohege erf Medical Genetics, Unauthorized reproduction of mis aricie is prohibited. 



127 



Koch et al. 

rent cigarette smoking, diabetes mellitus), the analysis. showed 
that none of the five most frequent nine-marker haplotypes, 
including the Hap5 haplotype, or the combined other haplo- 
types were significantly related with MI (P ^ 0.38). 

DISCUSSION 

The present data show that specific SNPs in ALOX5AP are 
not associated with MI in a large German population. Analyses 
in women and men did not reveal sex-specific relationships 
between the SNPs and MI. Specific four-marker haplotypes, 
the HapA and HapB haplotypes, and nine-marker haplotypes 
were not associated with MI. Most SNPs were in strong LD and 
the LD block structure was similar to those in other white 
populations. 4 * 14 Three of the nine SNPs examined here, the 
SG13S100, SG13S106, and SG13S1 14, were found to be signif- 
icantly associated with MI in a population from Iceland that 
consisted of 779 unrelated patients with MI and 624 popula- 
tion-based control individuals. 1 However, significant associa- 
tions between these SNPs and MI were not observed when 
adjustments were made for the number of markers tested. 1 
None of the three SNPs was associated with MI in the present 
population. The HapA haplotype was associated with a two- 
fold greater risk of MI in the Icelandic population (nominal 
P = 0.0000023; adjusted P = 0.005) but not in a sample from 
the UK (753 patients with MI and 730 control individuals). 1 In 
the same UK population, the HapB haplotype was associated 
with MI (nominal P = 0.00037; adjusted P = 0.046). 1 

The control subjects had some indication for coronary an- 
giography, and, therefore, they did not constitute a typical 
sample of healthy controls. We compared the frequencies of 
the HapA haplotype and the SNP alleles that define the HapA 
haplotype between the present control sample and an indepen- 
dent control group that consisted of 736 unrelated individuals 
from the KORAS2000 sample, a representative local popula- 
tion sample from southern Germany. 4 In the present control 
group and the control group from the KORAS2000 sample, 4 
the frequencies of the HapA haplotype were 14.8% versus 
15.2% (P = 0.74) and the frequencies of the SG13S25-G, 
SG13S114-T, SG13S89-G, and SG13S32-A alleles were 89.3% 
vs. 90.1% (P = 0.42), 65.5% vs. 65.0% (P = 0.77), 95.0% vs. 
96 0% (P = 0.15), and 50.5% vs. 49.7% (P = 0.62), respec- 



tively. Thus, with regard to the frequencies of the HapA hap- 
lotype and the alleles that constitute the HapA haplotype, the 
present control group is not substantially different from an 
established population-based sample. We inferred from this 
finding that the control sample with coronary angiography was 
suitable for the genetic association study described here. Mea- 
sures of inflammation were not examined, which is a limita- 
tion of the current study. 

Relationships of ALOX5AP SNPs and haplotypes with MI 
and ischemic stroke were evaluated in a nested case-control 
study within the Physicians' Health Study cohort that com- 
prised predominantly white (>94%) male US physicians. 14 * 20 
Investigation of 341 MI case-control pairs did not provide ev- 
idence of an association of any of the tested SNPs or the HapA 
or HapB haplotype with MI. 14 Genotype distributions and fre- 
quencies of SNP alleles and the HapA and HapB haplotypes in 
the case and control groups of the US sample 14 corresponded 
well with those of the present German sample. 

Similar to results obtained in Germans (this study) and US 
physicians, 14 the SNPs that define the HapA and HapB haplo- 
types were not associated with MI in a Japanese population 
that included 353 patients with MI and 1875 control 
individuals. 2 A meaningful association analysis of the HapA 
and HapB haplotypes was not possible in the sample from 
Japan because, with some of the SNPs, minor alleles were either 
absent or extremely rare. 2 Two-marker ALOX5AP haplotypes 
not related to the HapA and HapB haplotypes were associated 
with MI in the Japanese sample. 2 

Studies conducted with samples of white individuals pro- 
vided heterogeneous results about the relationship of the 
HapA and HapB haplotypes with MI (Table 7). Association of 
the HapA haplotype with MI was observed in a study sample 
from Iceland, but this finding was not replicated in samples 
from Germany (present study), the UK, and the US. 1 - 14 A rela- 
tionship of the HapB haplotype with MI was found in a study 
sample from the UK, but this result was not confirmed in sam- 
ples from Germany (present study) and the US. 1 * 14 Heteroge- 
neities of genetic and environmental factors across the source 
populations are unlikely to account for the inconsistencies. 
Genetic markers for proposed gene-disease associations may vary 
in frequency between populations, but there is empirical evidence 
that their biological impact on the risk of common diseases is 



Frequencies of theHapA and Hap B haplotypes and estimated risks of 

^ . population samples . . 



HapA 



HapB 



Study 



Controls/cases 



Risk 



Controls/cases 



Risk 



Germany (present) 
United States 14 
Iceland 1 

United Kingdom 1 



0.15/0.16 
0.14/0.17 
0.10/0.16 
0.15/0.17 



1.10 
1.18 
1.80 
n.s. 



0.16 
0.46 
<0.005 fl 



0.08/0.07 
0.07/0.06 

0.04/0.08 



0.94 
0.62 

No data available 
1.95 



Haplotype frequencies are presented as proportions of controls and cases; n.s. not significant (data not shown).' 
"Adjusted for the number of haplotypes tested. 1 



0.48 
0.08 

0.046° 



128 



Genetics IN Medicine 

*hwrpc\ rRoi'odiictiofi o( this article is prohibited 



AL0X5AP polymorphisms and risk of Ml- 



usually consistent even across ethnic boundaries. 21 Consistent 
replication of genetic associations has been difficult to achieve, 
despite the biological plausibility of these associations. 22 In this 
context, the present findings argue against association of defined 
SNPs and haplotypes of ALOX5AP with MI. 

ACKNOWLEDGMENTS 

The work was entirely supported by institutional financing 
from the German Heart Center. Munich. The authors thank 
Wolfgang Latz, Marianne Eichinger, and Claudia Ganser for 
excellent technical assistance. 

References 

1 Helgadottir A, Manolescu A, Thorleifsson G, Gretarsdottir S, et al. The gene encod- 
ing 5-lipoxygenase activating protein, confers risk of myocardial infarction and 
stroke. Afof Genef 2004;36:233-239. 

2 Kajimoto K, Shioji K, lshida C, Iwanaga Y, et al. Validation of the assoc. ation be- 
tween the gene encoding 5-lipoxygenase-aaivating protein and myocardial infarc- 
tion in a Japanese population. Ore J 2005;69:1029-1034. 

3 Helgadottir A, Gretarsdottir S, St Clair D, Manolescu A, et al. Association between 
the gene encoding 5-lipoxygenase-aaivating protein and stroke replicated in a Scot- 
tish population. Am J Hum Genet 2005;76:505-509. 

4. Lohmussaar E, Gschwendtner A, Mueller JC, Org T, et al. ALOX5AP gene and the 
PDE4D gene in a Central European population of stroke patients. Stroke 2005;36: 

731-736. J . ' . c 

5 Miller DK, Gillard JW, Vickers PJ, Sadowski S, et al. Identification and isolation ot a 
membrane protein necessary for leukotriene production. Nature 1990;343:278-281. 

6. Dixon RAF, Diehl RE, Opas E, Rands E, et al. Requirement of a 5-hpoxygenase- 
activating protein for leukotriene synthesis. Nature 1990;343:282-284. 

7. Libby P, Ridker PM, Maseri A. Inflammation and atherosclerosis. Circulation 2002; 
105:1135-1143. 

8 Dwyer JH, Allayee H, Dwyer KM, Fan J, et al. Arachidonate 5-hpoxygenase pro- 
moter genotype, dietary arachidonic acid, and atherosclerosis. N Engl J Med 2004; 
350:29-37. 



9. De Caterina R, Zampolli A. From asthma to atherosclerosis - 5-lipoxygenase, leu- 
kotrienes, and inflammation. N Engl J Med 2004;350:4-7. 

10. Spanbroek R, Grabner R, Lotzer K, Hildner M, et al. Expanding expression i of the 
5-lipoxygenase pathway within the arterial wall during human atherogenesis. Proc 
NatlAcadSciUSA 2003;100:1238-1243. 

1 1. Zhao L, Funk CD, Lipoxygenase pathways. in atherogenesis. Trends Cardiovasc Med 
2004;14:191-195. 

12. Kennedy BP, Diehl RE, Boie Y, Adam M, et al. Gene characterization and promoter 
analysis of the human 5-lipoxygenase-artivating protein (FLAP). / Biol Chem 1991; 
266:8511-8510. 

13. Yandava CN, Kennedy BP, Pillari A, Duncan AM, et al. Cytogenetic and radiation 
hybrid mapping of human arachidonate 5-lipoxygenase-activating protein 
(ALOX5AP) to chromosome 13ql2. Genomics 1999;56:131-133. 

14. Zee RYL, Cheng S, Hegener HH, Erlich HA, et al. Genetic variants of arachido- 
nate 5-lipoxygenase-activating protein, and risk of incident myocardial infarc- 
tion and ischemic stroke. A nested case-control approach. Stroke 2006,37:2007- 

2011. s . ... 

15. World Medical Association declaration of Helsinki. Recommendations guiding 
physicians in biomedical research involving human subjects. JAMA 1997,277:925- 
926. 

16 Chalmers ), MacMahon S, Mancia G, Whitworth J, et al. 1999 World Health Orga- 
nization-International Society of Hypertension Guidelines for the management of 
hypertension. Guidelines sub-committee of the World Health Organization. Cltn 
Exp Hypertens 1999;21:1009-1060. 

17. Diabetes mellitus. World Health Organization Study Group. Diabetes mellitus. 
WHO Tech Rep Ser 1985;727:1-104. 

18. Kutyavin IV, Afonina IA, Mills A, Corn W, et al. 3'-Minor groove binder-DNA 
probes increase sequence specificity at PCR extension temperatures. Nucleic Acids 
Res 2000;28:655-661. 

19. Stephens M, Smith NJ, Donnelly P. A new statistical method for haplotypc recon- 
struction from population data. Am ) Hum Genet 2001;68:978-989. 

20 Final report on the aspirin component of the ongoing Physicians' Health Study. 
Steering committee of the Physicians' Health Study research group. N Engl } Med 
1989;321:129-135. . 

21 . Ioannidis JPA, Ntzani EE, Trikalinos TA. 'Racial' differences in genetic effects for 
complex diseases. Nat Genet 2004;36:1312-1318. 

22. Hirschhorn IN, Lohmueller K, Byrne E, Hirschhorn K. A comprehensive review of 
genetic association studies. Genet Med 2002;4:45-61. 



February 2007 • Vol. 9 • No. 2 

Copyright i& Ame-'ican C 



129 



Unauthorized reproduction of this article is prohibited. 



ORIGINAL CONTRIBUTION 



* a 

Nonvalidation of Reported Genetic Risk 
Factors for Acute Coronary Syndrome 
in a Large-Scale Replication Study 



Context Given the numerous, yet inconsistent, reports of genetic variants being as- 
sociated with acute coronary syndromes (ACS), there is a need for comprehensive vali- 
dation of ACS susceptibility genotypes. 

Objective To perform an extensive validation of putative genetic risk factors for ACS. 
Design, Setting, and Participants Through a systematic literature search of articles 
published before March 10, 2005, we identified genetic variants previously reported as 
significant susceptibility factors for atherosclerosis or ACS. Restricting our analysis to white 
patients to reduce confounding from racial admixture, we identifed 811 patients who 
presented from March 2001 through June 2003 with ACS at 2 Kansas City, Mo, university- 
affiliated hospitals. During 2005-2006, we genotyped the 811 patients along with 650 
age- and sex-matched controls for 85 variants in 70 genes and attempted to replicate 
previously reported associations. We further explored possible associations without prior 
assumption of specific risk models and used the Sign test to search for weak associations. 

Main Outcome Measures Compare each prespecif ied gene variant associated with 
ACS risk among cases and controls. A surplus of associations would imply that some 
are associated with ACS. 

Results Of 85 variants tested, only 1 putative risk genotype (-455 promoter variant 
in p-fibrinogen) was nominally statistically significant (P= .03). Only 4 additional genes 
were positive in model-free analysis. Neither number of associations was more fre- 
quent than expected by chance, given the number of comparisons. Finally, only 41 of 

84 predefined risk variants were even marginally more frequent in cases than in con- 
trols (with 1 tie), representing a 48.8% "win rate" (95% confidence interval, 38.1 %- 
59.5%) for the collective risk genotypes (P=.91 , Sign test). 

Conclusions Our null results provide no support for the hypothesis that any of the 

85 genetic variants tested is a susceptibility factor for ACS. These results emphasize 
the need for robust replication of putative genetic risk factors before their introduc- 
tion into clinical care. 

JAMA. 2007;297:1551r1561 www.jama.com 



Thomas M. Morgan, MP 

Harlan M. Krumholz, MP, MS 

Richard P. Lifton, MP, PhP 

John A. Spertus, MP, MPH 

COMPELLING EVIDENCE FROM 
twin and epidemiological 
studies suggests a genetic ba- 
sis for atherosclerotic heart 
disease and acute coronary syn- 
dromes (ACS), including unstable an- 
gina, non-ST-elevation myocardial in- 
farction (NSTEM1), and ST-elevation 
myocardial infarction (STEM1). 1 ' 2 To 
date, numerous candidate genes have 
been implicated, mainly by case- 
control studies, as potential cardiovas- 
cular risk factors, but few, if any, have 
been established definitively. 3 ' 5 Fac- 
tors undermining the validity of pre- 
vious reports include inappropriately 
small sample sizes, multiple subgroup 
comparisons, and publication bias. 4 

Before use in clinical care, potential 
genetic risk factors would ideally be 
replicated en masse in large, well- 
characterized patient populations. 6 To 
date, no such comprehensive valida- 
tion of genetic variants potentially as- 
sociated with ACS or atherosclerosis has 
been reported. 

Accordingly, we first sought to 
identify genetic associations with ACS 
by systematically searching the medi- 
cal literature for variants reported in 
association with Ml, unstable angina, 
or atherosclerosis. We then attempted 
to validate these putative genetic risks 
in a large case-control study. 



METHODS 
Candidate Genes 

We searched PubMed and bibliogra- 
phies of original and review articles 
for manuscripts published before 
March 10, 2005, that reported statisti- 
cally significant associations between 
specific genotypes and coronary ath- 
erosclerosis or ACS (A list of the 
articles is available on request from 
the authors). MEDLINE search terms 



Author Affiliations: Department of Genetics, Howard 
Hughes Medical Institute (Drs Morgan and Lifton), 
Robert Wood Johnson Clinical Scholars Program and 
Department of Internal Medicine (Dr Krumholz), Yale 
University School of Medicine, New Haven, Conn, and 
Mid -America Heart Institute and University of Missouri- 
Kansas City, Mo (Dr Spertus). Dr Morgan is now with 
the Department of Pediatrics, Division of Genetics and 
Genomic Medicine, Washington University School of 
Medicine, St Louis, Mo. 

Corresponding Author Thomas M. Morgan, MD, 
Washington University School of Medicine. McDonnell 
Pediatric Research Bldg, 3103. 660 Eudid Ave, St Louis. 
MO 63110, (email: morgan_t@kids.wustl.edu) or 
Richard P. Lifton, MD, PhD, Yale University School of 
Medicine, 295 Congress Ave, New Haven, CT 06510 
(richard.lifton@yale.edu). 



©2007 American Medical Association. All rights reserved. (Reprinted) JAMA, April 11, 200/— Vol 297, No. 14 1551 



Downloaded from www.jama.com by KellyPucci, on April 11, 2007 



GENETIC RISK FACTORS FOR ACUTE CORONARY SYNDROMES 



included: gene, genetic, polymorphism, 
myocardial infarction, atherosclerosis, 
coronary heart disease, and coronary 
artery disease. Reports were included 
if they contained a claim of a signifi- 
cant positive association, with an 
investigator-reported P value <.05. A 
total of 96 polymorphic genetic vari- 
ants in 75 genes were identified and 
included (Table 1 and Table 2). 
Eleven of those were excluded because 
they had failed the multiplex genotyp- 
ing assay. 

Description of Cases and Controls 

Eight hundred eleven white patients of 
European ancestry with ACS were 
identified from a consecutive series of 
patients presenting at 2 Kansas City, 
Mo, hospitals (Mid-America Heart 
Institute and Truman Medical Cen- 
ter),. from March 2001 through June 
2003. Standard definitions were used 
to diagnose ACS patients with either 
MI or unstable angina. 92,93 Myocardial 
infarction was defined by a positive 
troponin blood test in the setting of 
symptoms and electrocardiogram 
changes (both ST-segment elevation 
and non-ST-segment elevation 
changes) consistent with Ml. Unstable 
angina diagnoses were confirmed, by 
concurrence of 3 physician chart 
reviewers, if patients had negative tro- 
ponin blood tests and any one of the 
following: new onset angina (<2 
months) of at least Canadian Cardio- 
vascular Society Classification class 111, 
prolonged (>20 minutes) rest angina, 
recent (<2 months) worsening of 
angina, or angina that occurred within 
2 weeks of an Ml. 93 Of the troponin- 
negative unstable angina patients, 203 
(92.7%) had a cardiac catheterization, 
a nuclear stress test, or a stress echo- 
cardiogram to corroborate their 
diagnoses. 

Each participating inpatient with 
ACS was interviewed to determine vari- 
ables, such as smoking, alcohol use, 
family history (>1 first-degree rela- 
tives with MI or coronary artery dis- 
ease), and to obtain consent for a blood 
sample for genetic analysis. In addi- 
tion, detailed chart abstractions were 



performed to collect relevant labora- 
tory and clinical data. 

A total of 1045 ACS patients (of 
which 81 1 white patients were included 
in the current study) agreed to partici- 
pate and to provide a blood sample for 
genetic analysis. Patients self-reported 
their race/ethnicity by selecting one of 
the following descriptors that were pro- 
vided by the investigators: white, white 
Hispanic, African American, and Afri- 
can American non-Hispanic. Age- and 
sex-matched controls were recruited 
from the ambulatory outpatient clini- 
cal laboratory of 1 of the centers, Saint 
Luke's Hospital of Kansas City. These 
patients were undergoing routine labo- 
ratory testing and were asked to com- 
plete a medical questionnaire defining 
cardiac risk factors and medical co- 
morbidities. Those controls reporting 
a previous ACS, prior coronary artery 
bypass graft surgery or prior percuta- 
neous coronary intervention were 
excluded. To minimize the potential 
impact of genetic admixture, 650 white 
controls of mixed European ancestry 
who reported no history of coronary 
artery disease were selected from among 
the 1054 potential controls. Risk fac- 
tor data were missing for 9 sex-, age-, 
and race-matched unaffected con- 
trols, and 56 additional matched 
controls were used for ALOX5AP 
haplotyping. 

The research protocol was ap- 
proved by the institutional review 
boards of both institutions; all study 
participants provided written in- 
formed consent for clinical and ge- 
netic studies. 

Genotyping 

Genomic DNA was isolated (Gentra 
PUREGENE, Minneapolis, Minn) from 
blood samples and subjected to 
whole genome amplification by mul- 
tiple-strand displacement (Molecular 
Staging lnc, New Haven, Conn), using 
random priming and Phi-29 polymer- 
ase. 94,95 Genotyping was performed 
using the Sequenom MALD1-TOF 
(Matrix Assisted Laser Desorption- 
lonization Time-of-Flight) system, 
using Spectrodesign software for as- 



say design (Sequenom, San Diego, 
Calif), and assay methods that have pre- 
viously been described. 96,97 Gene vari- 
ants were excluded from analysis if they 
could not be genotyped using the Se- 
quenom system due to persistent as- 
say failure, defined as less than 95% 
scorable genotypes after 4 multiplex re- 
action design cycles. Eleven assays were 
ultimately excluded.* For the rare 
MEF2A 21-base pair (bp) deletion, 
cases and controls were genotyped by 
polymerase chain reaction to generate 
amplicons of 152-bp nondeletion or 
131-bp deletion followed by electro- 
phoresis on 3% agarose gels. Identi- 
fied deletions were confirmed by di- 
rect DNA sequencing. Due to its rarity, 
MEF2A was analyzed separately, and 
thus only the other 84 genes were sub- 
jected to the full set of statistical analy- 
ses. PHASE Version 2. 1 was used to es- 
timate haplotype frequencies for 
ALOX5AP™' 102 

Statistical Analysis 

Genotype distributions in cases and 
controls were examined for signifi- 
cant deviation (P<.05) from Hardy- 
Weinberg equilibrium. The number 
of departures was assessed by Monte 
Carlo simulation and compared with 
the number expected by chance 
alone (Resampling Stats lnc, College 
Park,Md). 

In the primary analysis, each ge- 
. netic variant was prespecified based on 
published reports, and the frequen- 
cies of risk-associated variants were 
compared in cases and controls by using 
a 100 000 iteration Monte Carlo exten- 
sion of the x 2 test (SPSS 13.0 Exact 
Tests, SPSS lnc, Chicago, 111). The term 
statistically significant was reserved 
for a P value below the Bonferroni- 
corrected study-wide significance 
threshold (0.05/84=0.0006). Because 
the Bonferroni correction is conserva- 
tive when applied to a replication study, 
the total number of all positive asso- 
ciations at the P<.05 level was also 
compared with the expected number by 
chance in 100 000 simulations. A 

♦References 13, 20, 42, 45, 72, 88, 98-100. 



1552 JAMA, April 11, 2007— Vol 297, No. 14 (Reprinted) 



©2007 American Medical Association. All rights reserved. 



Downloaded from www.jama.com by KellyPucci, on April 1 1, 2007 



GENETIC RISK FACTORS FOR ACUTE CORONARY SYNDROMES 



Table 1. Validation of Predefined Risk Genotype Comparisons in Cases vs Controls 



Gene Symbol 


Variant 


Genotype 

f* r\m rt a ri c n 

wumporiouii 


Risk Variant 
Control Frerjuencv 


Odds Ratio 
(95% CI) 


2-Tailed 
P Value 


Caenotype Frequency 
Difference 


ABCA1 


a -7-7 /~[~7,$ 

-477C/ 1 • 


I I V5 \s 1 Ul v-/l«y 


0.222 


1.13(0.88-1.45) 


.35 


0.0215 


ABCA1 


Lys2l9Arg a 


• A VS G 


0.274 


1.02(0.87-1.20) 


.83 


0.0041 


ACE1 


indel 4 


PlO wo Pll r\r II 

uu vs ui or ii 


0.286 


1 07 (0 85-1 35) ' 


.60 


0.0139 


ADD1 


Gly460Trp 11,l4: 


I VS La 


n 1QQ 

U. I 5753 


1 09 (0 91-1 32) 


.35 


0.0148 


ADRB2§ 


• Glu27Gln 1J 


G vs C 


n AAA 


ft Q^ f0 RO-1 08) 


.34 


. -0.0186 


ADRB2 


He164Thr 13 


CC vs CT 


a qpq 
u.yoy 


ft 47 /ft 90-1 13} 


.10 


-0.0117 


ADRB2 


Gly16Arg 13 


A vs G 


O Q77 


1 n4 /ft rq-1 90^ 


.67 


0.0082 


ADRB3 


Arg64Trp 14 


C vsT 


A A77 


ft QQ /ft 7R-1 ^1} 

U.yy \U. I O I.Oij 


>.99 


-0.0004 


AGT 


Thr235Met 15 


T vs C 


a c;7o 


1 C\A Id 


.65 


0.0086 


AGTR1 


A1166C 16 


CC vs CA or AA 


a oqo 


1 nfl (n 7R-1 


.72 


0.0063 


ALOX5AP 


HAP B 17 


HAP B vs non-B 


0 neo 
U.uoz 


1 10 fO, QH-1 AC\\ 

1 . 1 £ 1 '^UJ 


.31 


0.0120 


ALOX5AP 


HAP A 17 


HAP A vs non-A 




0 on ft") 7f%_l 10} 


.32 


-0.0130 


APOA1 


C83T 18 


T vs C 


n 1 7Q 
U. 1 f 0 


ft QQ (0 81-1 20) 

U.C7v7 \U.O 1 1 .C\J) 


.92 


-0.0019 


APOA1 


-75G/A 19 


A vs G 


U.UU4 


1 QS fft RR-S S4) 


.23 


0.0037 


APOE 


ArglSSCys 20 


CC vs CT or TT 


O AOQ 


I .O I "O.Ufc/ 


.17 


0.0134 


APOE 


-219T/G 21 


T vs G 


O »7K 




.08 


0.0329 


BDKRB2 


-58C/T 22 


C vs T 


a «^qa 
u.oyu. 


O. Q^ fft ftft-1 ftfl} 


.37 


-0.0169 


cam 


Thr23Ala 23 


C vsT 


A QQ7 

U.ooY 


ft Q7 rt^l 1QV 

VJ.I7I \VJ.OU 1.157/ 


.80 


-0.0036 


CCR2 


Val64lle 24 


GG vs AG or AA 


O Q CC 

U.000 


n Q4 7n-i 0^ • 


.71 


-0.0082 


CCR5 


Indel 25 


I vs D 


0.907 




.05 


-0.0224 


CD14 


-1 59C/T 26 


TT vs CT or CC 


A OQ7 


1 1 n /ft ftft-1 ^Q} 

I . I U \U.OD I .053^ 


.46 


0.0170 


CETP 


intronl G/A 27 


G vs A 


0.568 


n qq /n Qi-1 Oft\ 

u.yo \u.c5i- 1 .voj 


^7 

.Of 


-0.0169 


CETP 


-629C/A 28 


C vs A 


0.505 


O OR /O QQ 1 i 

u.yo (u. 00-1 .1 ij 


ftft 
.uu . 


-0.0104 


COMTt 


VaUSSMet 29 


GG or AG vs AA 


0.222 


1 1 1 /A G7 1 AQ\ 


• 41 


0.0193 


CX3CR1 


He249VaP°" 31 


C vsT 


0.729 


A Oft /n Q1 11 Q\ 
u.yo \U.O l - 1 . 1 OJ 


.62 


-0 0088 


CX3CR1 


Thr280Met 30 . 


G vs A 


0.831 


1 Oft /A P.7-1 9Q^ 

i .uo ^u.o^ - 1 .^yj 


• UO 


0 0080 


CYP1 1B2\ 


-344T/C 32 ' 33 


C vsT 




1 .uy ^u.yH- 1 .£.0) 


27 


0.0204 


CYP2C9 


LeuSSgile 34 - 35 


AC vs AA 


0.094 


A 7ft /A C"0 1 101 


.17 


-0.0208 


CYP2C9*t 


Cysl^Arg 34 - 35 


CC vs CT 


uyyo 


1 Oi /A 77 1 Q9\ 
I .U I \U. f / - 1 .O^) 


.95 


0.0019 


ENPP1 


Gln^lLys 36 


C vsA 


0.130 


i 07 /A QC 1 QO\ 


RQ 

.oy 


0.0076 


ESR1 


-401T/C 37,58 


TT vs CT or CC 


0^285 


1 Aft rt"l QQ^ 

1 .uo iu.o*t- 1 .OO) 


.64 


0.0118 


F12 


46C/F 9 


TT vs CT or CC 


0.067 


A OO /A CA 1 >1A\ 

u.y<^ \u.0u-1 .^u; 


7R 
. 1 u 


-0.0051 


F13A1 


Val34Leu 40 


G vsT 


0.75o 


1 AO /A Oft 1 OO^ 


.79 


0.0045 


F2 


G20210A 4041 


A vs G 


0 rn 7 
U.U l / 


A QO /A K.O.1 

u.y^ \u.o*:- 1 .o*+; 


.88 


-0.0013 


F5 


Arg506Gln 40 


. A vs G 


0.025 


u.yo ^u.oo- 1 .au/ 


.81 


-0.0016 


F7t 


Arg353Gln 42 


G vs A 


u.oyo 


A QA /A 74-1 1Q\ 


.63 


-0.0062 


FGB 


-455A/G 43 


GG or AG vs AA 


a £A7 


1 07 M ft^-1 'Sft) 


.03 


0.0558 


GJA4 


C1019T 44 


T vs C 


0 qi 0 


n Qn /ft 77-1 ftfil 
u.yu \u./ / i .uu/ 


.22 


-0.021 5 


GP1BA 


-5T/C 44 46 


T vs C 




ft Qd /ft 7S-1 17) 

U.53*t \\J. 1 \J I . 1 f / 


.57 


-0.0071 


GRL 


AsnSo^Ser 47 


A(j> VS AA 


a 07Q 


ft 7Q /ft RO-1 1Q) 


.29 


-0.0146 


HFE 


Cys282Tyr 46 


A vs G 


A OCR. 
U.UOO 


ft Qft /ft 7V1 *^P) 


.94 


-0.0010 


HTR2A* 


Ser102Ser 4 ^ 


1 I vs CI or 


n mft 

U. I DO 


1 1? /0 R4-1 48) 


.47 


0.0154 


ICAM1 


Lys469Glu 50 


A vs G 


n ^R7 


1 ftQ (ft Q4-1 27) 
1 .uy \U.57H 1 .t / / 


.26 


0.0214 


IL1B 


-511C/T* 1 


CC vs CT or TT 


0.445 


1.14(0.92-1.41) 


.24 


0.0328 


IL6 


-174G/C 52 ' 53 


CvsG 


0.403 


1.05(0.91-1.22) 


.50 


0.0127 


IRS1 


Arg97lGly 64 


AvsG 


0.059 


0.96 (0.70-1.32) 


.81 


-0.0023 


ITGA2 


Phe807Phe 5556 


. AvsG 


0.391 


1.03(0.88-1.19) 


.73 


0.0063 


ITGB3 


Leu33Pro 4 


CvsT 


0.164 


0.85 (0.70-1.05) 


• 14. 


-0.0203 


UPC 


. -514T/C 5758 


TvsC 


0.240 


0.85 (0.71-1.01) 


.07 


-0.0286 


LPA 


AspgAsn 50 


AG vs GG 


0.026 


1.43 (0.78-2.63) 


.29 


0.0107 


LRP1 


1Tir3261"mr ao 


GG or AG vs AA 


0.107 


1.02 (0.73-1.42) 


.93 


0.0018 



(continued) 



©2007 American Medical Association. AH rights reserved. (Reprinted) JAMA, April 11, 200^— Vol 297, No. 14 1 



Downloaded from www.jama.com by KellyPucci, on April 1 1, 2007 



GENETIC RISK FACTORS FOR ACUTE CORONARY SYNDROMES 



Table 1. Validation of Predefined Risk Genotype Comparisons in Cases vs Controls (cont) 



Gene Symbol 


Variant 


Genotype 
Comparison 


Risk Variant 
Control Frequency 


Odds Ratio 


2-Taited 
P Value 


Genotype Frequency 
Difference 


LTA 


A252G 6162 


GG vs AG or AA 


0.1 16 


U.OO lU.O^- l -ex)) 


ATI 


-0.0147 


LTA 


Thr26Asn 61,62 


AA vs AC or CC 


0.119 


U.o^ (U.oy-l.l^; 


07 


-0.0191 


MGP 


Thr^Ma 63 


G vs A 


0.385 


l.UU ^U.oO-l. ID) 


QQ 


0 0000 


MGP 




Gvs A 


0.636 


I .UU \U.OO- 1 ;\\>) 


Q7 
.01 


-0.001 0 


MMP3 


indel 64 - 65 


DD vs Dl or II 


0.284 


f\ oil /n cc 1 nc\ 
Q.8*?(U.bO-l.UO) 


.lO 




MTHFR* 


Ala222Val 86 


TT vs CT or CC 


0.100 




m 

. iU 


0.0283 


MTP 


-493G/T 57,68 


TvsG 


0.255 


.1.01 (U.oO-1 .<Z\J) 


.00 


\J.\J\JC.O 


MTR 


Asp919Gly 69 


A vs G 


0.804 


-t r\0 in OK 1 Ov1\ 

1.0v3 (U.oO-1 .ZA) 


7Q 




NPPA 


Ter29ArgArg 70 


CC vs CT or TT 


0.023 


a on tr\ o qo\ 
1 .2\j (U.bl 






OLR1 


Lys167Asn 71 


C vsG 


0.916 


r\ nn /A CO 1 

\J.o2. (U.oo-1 .Uo) 


1 0 

.it 




p22-PHOX% 


His72Tyr 72 - 73 


CC vs CT or TT 


0.337 


i . io iu.yi -1 .oyj 


9ft 


n 0?QQ 


PAI1 


Indel 43 - 44 


DD vs Dl or II 


0.309 


1.00 (0.80-1 .do) 


>.yy 


n norm 


PECAM1 


Leu125Val 74 - 75 


GG vs CG or CC 


0.288 


0.94 (0.75-1 .19; 


CSX 


— u.u i<;u 


PECAM1 


Ser563Asn 74 - 75 


AvsG 


0.498 


0.97 (0.84-1 .13) 


-j-i 
.11 


__n nn79 


PON1 


Glnl92Arg 76 


AvsG 


0.705 


1.01 (0.86-1.18) 


.97 


U.UUIU 


PON2 


Cys311Ser 77 


CC vs CG or GG 


0.556 


1.10 (0.89-1.35) 


.40 


- ri no9°. 


PPARG 


Ala12Pro 78 


CvsG 


0.129 


0.83 (0.66-1.03) 


.10 


— u.u^uu 


PTGS2 


-765G/C 79 


CvsG 


0.168 


0.85 (0.69-1 .04) 


.11 


-o.Uiiiy 


RECQL2 


Arg1367Cys 80 


TvsC 


0.758 


0.80 (0.68-0,94) 


.01 


-U.U4oo 


SELE 


Leu554Phe 75 


TvsC 


0.036 


1.17 (0.80-1.71) 


.44 


0.0059 


SELE 


Ser128Arg 75 


C vs A 


0.103 


0.91 (0.71-1.16) 


.45 


-0.0086 


SELP 


Thr715Pro fl1 


AvsC 


0.902 


0.93 (0.73-1.18) 


.58 


-U.LaJDO 


TFPi 


Va^N/let 82 


AG vs GG 


0.050 


0.80 (0.49-1,32) 


.44 


-0.0096 


THBD 


-33G/A 83 


AG vs GG 


0.002 


2.39 (0.25-23.0) 


.63 




THBD 


Ala2o l nr^.. 


Ab VS 




1 ?Q (0 50-3 35) 


.64 


0.0030 


THBD 


Ala455Val 85 


CC vs CT or TT 


0.659 


1.05(0:85-1.31) 


.65 


0.0114 


THBS1 


Asn700Ser 86 


GG vs AG or AA 


0.021 


0.81 (0.38-1.72) 


.70 


-0.0039 


THBS2t 


3'UTR T/G 87 


TT or GT vs GG 


0.950 


0.51 (0.34-0.79) 


.002 


-0.0432 


THBS4 


Ala387Pro 87 


GG or CG vs CC 


0.939 


1.00(0.65-1.53) 


>.99 


-0.0002 


THPO 


A5713G 88 


GG vs AG or AA 


0.264 


1.20(0.95-1.51) 


.13 


0.0368 


TLR4 


Gly299Asp 89 


AvsG 


■ 0.942 


.1.02(0.75-1.40) 


.94 


0.0011 


TNF 


-308G/A 00 


AvsG 


0.158 


0.86(0.70-1.06) 


.17 


-0.0188 


TNFRSF1A 


Arg92Gln 91 


AG vs GG 


0.050 


0.80(0.49-1.32) 


.44 


-0.0096 



♦Hardy-Weinberg 
fHardy-Weinberg 
*P<.001 (n = 2), 
§P<.001 (n=1). 



equilibrium deviation in 
equilibrium deviation in 



controls, P<.05(n = 3). 
cases, P<.05 (n = 5). 



surplus of positive associations over 
random expectations would imply that 
some are truly associated with ACS. 

Secondarily, we also compared the 
overall genotype distributions at each 
locus in cases and controls by Monte 
Carlo x 2 testing. Power to confirm in- 
dividual genetic associations was de- 
termined using a log-likelihood-based 
method (Quanto 1.0). 103 ' 104 

Finally, as a measure to increase power, 
the observed proportion of prespecified 
risk variants found to be even margin- 
ally more frequent in cases than in con- 
trols was assessed by the Sign test. Under 



the null hypothesis, each of the risk vari- 
ants is equally likely to be more fre- 
quent in cases, or in controls. To esti- 
mate the Sign test's power to detect an 
excess of even weakly positive genetic 
associations (50 of 84 positive associa- 
tions confers P= .05 in the Sign test), we 
simulated the resampling of 650 con- 
trol and 811 case genotypes across 84 
genetic comparisons, finding the mini- 
mum detectable odds ratio ensuring a 
critical probability level of a 63.3% win 
rate for each 84 risk variants that pro- 
vides 80% confidence of having at least 
50 wins. 



RESULTS 

The clinical characteristics of the 811 
cases and 650 controls are described in 
Table 3 and the distributions of their 
genotypes are shown in Table 2. The 
population of ACS cases included 308 
(38%) STEM1, 284 (35%) NSTEM1, and 
219 (27%) unstable angina patients. 
Cases and controls had similar age, sex, 
and body mass index distributions. A 
family history of coronary artery dis- 
ease or Ml among first-degree relatives 
was 2.7-fold higher in male cases than 
in male controls and 2.0-fold higher in 
female cases than in female controls. 



1554 JAMA, April 11, 2007— Vol 297, No. 14 (Reprinted) 



©2007 American Medical Association. All rights reserved. 



Downloaded from www.jama.com by KellyPucci, on April 1 1 , 2007 



GENETIC RISK FACTORS FOR ACUTE CORONARY SYNDROMES 



I P Values in Cases With Acute Coronary Syndrome and Controls 



1 M. . VJ 


V.I IV Ljf 1 > V. V 




No. (%) 


2-Tailed 
P Value 








No. (%) 

, : — 1 O 


.Tailorl 


Gene 


Variant 


Genotype 


I 

Cases 


I 

Controls 


Gene 


Variant 


Genotype 


Cases Controls P Value 








191 (24.6) 


182 (28.3) 








CC 


539(68.2) 456(70.0) 




ABCA1 


-477C/T 


on 


396(51.0) 


319(49.5) 


.27 


CCL11 


Thr23Ala1: 


CT 


239(30.3) 178(27.3) 


.20 


n 


189(24.4)' 


143(22.2) 








TT 


12(1.5) 17(2.6) 








AA 


65(8.2) 


46(7.1) 








AA 


7(0.9) 6(0.9) 




ABCA1 


Lys219Arg 


AG 


311 (39.3) 


263 (40.6) 


pn 

.Do 


CCR2 


Val64lle 


AG 


116(14.4) 89(13.6) 


.91 


GG 


416(52.5) 


338(52.2) 








GG 


681 (84.7) 561 (85.5) 








DD 


233 (30.0) 


185(28.6) 








II 


631 (78.4) 540(82.4) 




ACE1 


l/D 


Dl 


389(50.1) 


329 (50.9) 


OA 


CCR5 


Indel 


ID 


162(20.1) 108(16.5) 


.15 


II 


154(19.8) 


132 (20.4) 








DD 


12(1.5) 7(1.1) 








GG 


456(60.7) 


419(64.9) 








CC 


204(25.4) 193(29.5) 




ADD1 


Gly460Trp 


GT 


269(35.8) 


197 (30.5) 


.Uo 


CD14 


-159C/T 


CT 


395(49.2) . 306(46.8) 


.22 


TT 


26(3.5) 


30(4.6) 








TT 


204(25.4) 155(23.7) 








CC 


266(34.5) 


217(34.8) 








AA 


168(20.9) 135(20.6) 




ADRB2 


Gtu27Glnt 


CG 


358(46.5) 


264 (42.3) 


14. 
. I *+ 


CETP 


intronl G/A 


AG 


387(48.1) 297(45.3) 


.44 


GG 


146(19.0) 


143(22.9) 








GG 


250(31.1) 224(34.1) 








CC 


789(97.8) 


652 (98.9) 








AA 


205(25.6) 171(26.4) 




ADRB2 


He164Thr 


CT 


18(2.2) 


7(1.1) 


. I 1 


CETP 


-629C/A 


AC 


400(49.9) 298(46.1) 


.31 


TT 


0 


0 








CC 


197(24.6) 178(27.5) 








AA 


128(16.3) 


100(15.2) 








AA 


231 (29.8) 181 (28.1) 




ADRB2 


Gty16Arg 


AG 


348 (44.3) 


294(44.8) 


.Of 




VOi 1 OOIVICI+ 


AG 


358 (46.1) 321 (49.8) 


.39 


GG 


309(39.4) 


262 (39.9) 








GG 


187(24.1) 143(22.2) 








CC 


6(0.7) 


1 (0.2) 








CC 


410(51.1) 353(53.6) 




ADRB3 


Arg64Trp 


CT 


111(13.8) 


99(15.1) 


.21 






CT 


336(41.9) 254(38.6) 


.43 


TT 


687 (85.4) 


557 (84.8) 








TT 


56(7.0) 51(7.8) 








CC 


143(17.8) 


107 (16.5) 








AA 


18(2.2) 13(2.0) 




AGT 


TTir235Met 


CT 


387 (48.3) 


340 (52.6). 






ThrPRDMpt 


AG 


223(27.7) 195(29.8) 


.65 


TT 


272 (33.9) 


. 200 (30.9) 








GG 


565(70.1) 447(68.2) 








AA 


388(48.1) 


332 (50.8) 








CC 


163(20.6) 109(16.6) 




AGTR1 


A1166C 


AC 


339(42.1) 


262 (40.1) 


.61 


PVD1 1 £29 




CT 


352(44.6) 319(48.6) 


.12 


CC 


79 (9.8) 


60 (9.2) 








TT 


275(34.8) 229(34.9) 








B 


50(6.5) 


41 (5.8) 








AA 


708 (92.7) 568 (90.6) 




ALOX5AP 


HAP B 


non-B 


734 (93.5) 


661 (94.2) 


.Ol 




l_t3lJO>Jv7llG 


AC 


56(7.3) 59(9.4) 


.17 




NA 


NA 








CC 


0 0 








A 


124 (15.9) 


122 (17.4) 








CC 


589(80.0) 491(79.8) 




ALOX5AP 


HAP A 


non-A 


654(84.1) 


584 (82.6) 


.00 




Pv<;144Ara*± 


CT 


147(20.0) 124(20.2) 


.95 




NA 


NA 








TT 


0 0 








AA 


23(3.0) 


25 (3.8) 








AA 


600(74.3) 498(75.7) 




APOA1 


C83T 


AG 


219(28.3) 


175 (26.9) 






filnl 21 Lvs 


AC 


192(23.8) 149(22.6) 


.84 


GG 


532 (68.7) 


450 (69.2) 








CC 


15(1.9) 11(1.7) 








AA 


1 (0.1) 


0 








CC 


145(18.0) 143(21.8) 




APOA1 


-75G/A 


AG 


10(1.3) 


5 (0.8) 


.0 1 


ESR1 


-401 T/C 


CT 


421 (52.3) 326(49.7) 


.20 


GG 


784 (98.6) 


coo /no d\ 

boo (yy.^j 








TT 


239(29.7). 187(28.5) 








CC 


29(3.6) 


15(2.3) 








CC 


459 (58.0) 371 (56.6) 




APOE 


Arg158Cys 


CT 


209 (26.1) 


154(23.5) 


.14 


F12 


46C/T 


CT 


283 (35.8) 241 (36.7) 


.84 


TT 


562 (70.3) 


487(74.2) 








TT 


49(6.2) 44(6.7) 








GG 


194 (24.2) 


177 (27.3) 








GG 


443 (56.8) 354 (55.8) 




APOE 


-219T/G 


GT 


403 (50.2) 


327 (50.5) 


.21 


F13A1 


Val34Leu 


GT 


296(37.9) 247(39.0) 


.93 


TT . 


206 (25.7) 


144(22.2) 








TT 


41(5.3) 33(5.2) 








CC 


263(32.8) 


221 (33.6) 








AA 


1 (0.1) 1 (0.2) 




BDKRB2 


-58C/T 


CT 


394(49.1) 


335 (50.9) 


.43 


F2 


G20210A 


AG 


23(2.9) 20(3.0) 


.94 


TT 


145 (18.1) 


102 (15.5) 








GG 


783(97.0) 635(96.8) 





(continued) 



©2007 American Medical Association. AH rights reserved. 



(Reprinted) JAMA, April 11, 2007— Vol 297, No. 14 1555 



Downloaded from www.jama.com by KellyPucci, on April 1 1 , 2007 



GENETIC RISK FACTORS FOR ACUTE CORONARY SYNDROMES 



Table 2. Genotype Frequencies and P Values in Cases With Acute Coronary Syndrome and Controls (cont) 









No. (%) 


2-Tailed 
P Value 


Gene 


Variant 


Genotype 


Cases 


I 

Controls 






AA 


1 (0.1) 


1 (0-2) 




F5 


Arg506Gln 


AG 


36 (4.5) 


31 (4.7) 


.95 






GG . 


769 (95.4) 


623 (95.1) 








AA 


16(2.0} 


6 (0.9) 




F7 


Arg353Gln* 


AG 


148(18.7) 


129 (19.6) 


.22 






GG 


629 (79.3) 


522 (79.5) 








AA 


24 (3.0) 


26 (4.0) 




FGB 


-455A/G 


AG 


247 (30.7) 


229 (35.3) 


.08 . 






GG 


533(66.3) 


394 (60.7) 








CC 


401 (50.6) 


311 (47.5) 




GJA4 


C1019T 


CT 


313(39.5) 


272(41.5) 


. .47 






rr 


78(9.8) 


72 (11.0) 








cc 


13(1.7) 


8(1.2) 




GPlBA 


-5T/C 


CT 


168(21.6) 


138(21.1) 


. .75 






TT 


597.(76.7) 


509 (77.7) 








AA 


756(94.1) 


608 (92.7) 




GRL 


Asn363Ser 


AG 


47 (5.9) 


48 (7.3) 


.29 






GG 


0 


0 








AA 


3 (0.4) 


1 (0.2) 




HFE 


Cys282Tyr 


AG 


96(12.0) 


83(12.6) 


.70 






GG 


703 (87.7) 


574 (87.2) 








CC 


286(36.5) 


275 (42.0) 




HTR2A 


Ser102Ser* 


CT 


363(46.4) 


278 (42.4) 


.11 






TT 


134(17.1) 


102(15.6) 








AA 


270(34.0) 


195(30.2) 




/GAM 7 


Lys469Glu 


AG. 


379(47.7) 


329(51.0) 


.30 






GG 


145.(18.3) 


121 (18.8) 








CC 


359 (47.7) 


289 (44.5) 




MB 


-511 C/T 


CT 


311 (41.4) 


292 (44.9) 


.40 






TT 


82(10.9) 


69 (10.6) 








CC 


142 (17.6) 


106(16.1) 




IL6 


-174G/C 


CG 


386(48.0) 


319(48.5) 


.73 






GG 


277 (34.4) 


" 233 (35.4) 








AA 


3(0.4) 


3 (0.5) 




IRSl 


Arg971Gly 


AG 


84(10.6) 


69 (10.9) 


.98 






GG 


704 (89.0) 


562 (88.6) 








AA 


123 (15.3) 


108(16.4) 




ITGA2 


Phe807Phe 


AG 


394 (48.9) 


298 (45.4) 


.40 






GG 


288 (35.8) 


251 (38.2) 








CC 


20 (2.5) 


14(2.2) 




ITGB3 


Leu33Pro 


CT 


188(23.6) 


182(28.3) 


.12 






TT 


588(73.9) 


446 (69.5) 








CC 


506 (62.9) 


369 (56.8) 




UPC 


-514T/C 


CT 


256(31.8) 


250 (38.5) 


.03 






TT 


42 (5.2) 


31 (4.8) 








AG 


29(3.7) 


17(2.6) 




LPA 


Asp9Asn 


GG 


765 (96.3) 


641 (97.4) 


.29 






GG 


0 


0 








AA 


367 (46.9) 


283 (43.2) 




LRPl 


Thr326lThr 


AG 


330 (42.2) 


302(46.1) 


.31 



GG 85(10.9) 70(10.7) 



No. (%) 



Gene 


Variant 


Genotype 


Cases 


" " I 
Controls 


z-ianea 
P Value 






AA 


394(49.1) 


282(429) 




LTA 


A252G 


AG 


327 (40.8) 


299 (45.5) 


.06 






GG 


.81 (10.1) 


76 (1 1 .6) 






• 


AA 


80(10.0) 


78 (11.6) 




LTA 


Thr26Asn 


AC 


331 (41.4) 


297 (45.3) 


.07 






CC 


389 (48.6) 


280 (42.7) 








AA 


308 (38.3) 


257 (39.1) 




MGP 


Thr83Ala 


AG 


374 (46.5) 


294 (44.7) 


.79 






GG 


123 (15.3) 


106(16.1) 








AA 


110(13.6) 


95(14.5) 




MGP 


-7A/G 


AG 


368(45.7) 


288(43.8) 


.77 






GG 


328 (40.7) 


274(41.7) 








DD 


194 (24.7) 


176(28.4) 




MMP3 


indel 




386(49.1) 


294 (47.5) 


.27 






. II 


206 (26.2) 


149(24.1) 








CC 


350(44.1) 


272(41.3) 




MTHFR 


Ala222Val* 


CT 


341 (43.0) 


320(48.6) 


.06 






TT 


102(12.9) 


66 (10.0) 








GG 


449 (55.8) 


371 (56.5) 




MTP 


-493G/T 


GT 


297 (36.9) 


237 (36.1) 


.95 






TT 


59 (7.3) 


49(7.5) 








AA 


529 (66.0) 


423 (65.7) 




MTR 


Asp919Gty 


AG 


239 (29.8) 


190(29.5) 


.88 






GG 


34 (4.2) 


31 (4.8) 








CC 


22 (2.8) 


15(2.3) 




NPPA 


Ter29ArgArg 


CT 


190(23.9) 


159(24.7) 


.83 






TT 


583 (73.3) 


471 (73.0) 








CC 


649 (80.8) 


543 (83.7) 




OLR1 


Lys167Asn 


CG 


146(18.2) 


103(15.9) 


.26 






GG 


8(1.0) 


3(0.5) 








CC 


347 (47.0) 


288 (44.0) 




P22-PHOX 


His72Tyr§ 


CT 


271 (36.7) 


293(44.7) 


.002 






TT 


121 (16.4) 


74 (11.3) 








DD 


249 (30.9) 


203(30.9) 




PAI1 


indel 


Dl 


398 (49.4) 


314(47.9) 


.77 






II 


159(19.7) 


. 139(21.2) 








CC 


187 (23.3) 


155(23.6) 




PEC AM 1 


Leu125Val 


CG 


395 (49.1) 


312(47.6) 


.82 






GG 


222 (27.6) 


189(28.8) 








AA 


200(25.0) 


163(25.5) 




PEC AM 1 


Ser563Asn 


AG 


386 (48.3) 


312(48.8) 


.92 






GG 


214(26.8) 


165(25.8) 








AA 


396(49.6) 


324 (49.3) 




PON1 


Gln192Arg 


AG 


337 (42.2) 


279(42.5) 


>.99 






GG 


66(8.3) 


54 (8.2) 








CC 


464 (57.9) 


366 (55.6) 




PON2 


Cys311Ser 


CG 


298 (37.2) 


251 (38.1) 


.50 






GG 


40 (5.0) 


41 (6.2) 








CC 


637 (79.2) 


492 (75.9) 




PPARG 


Ala12Pro 


CG 


159 (19.8) 


145 (22.4) 


.22 






GG 


8(1.0) 


11(1.7) 





(continued) (continued) 
1556 JAMA, April 11, 2007— Vol 297, No. 14 (Reprinted) ©2007 American Medical Association. All rights reserved. 

Downloaded from www.jama.com by KellyPucci, on April 1 1 , 2007 



GENETIC RISK FACTORS FOR ACUTE CORONARY SYNDROMES 



Table 2. Genotype Frequencies and P Values in Cases With Acute Coronary Syndrome and Controls (cont) 









No. (%) 


2-Tailed 
P Value 


Gone 


Vaildm 




Cases 


I 

Controls 






cc 


, 15{1.9) 


21 (3.3) 




DTPCO 






202 (25.5) 


174 (27.1) 


.17 






Uo 


576(72.6) 


447 (69.6) 










66 (8.2) 


38 (5.8) 




HtCQL2 


Arg lob/oys 


U 1 


326 (40.5) 


239 (36.7) 


n^ 






TT 
1 1 


412(51.2) 


375 (57.5) 








uo 


740(91.9) 


611 (92.9) 




SELE 


Leuo54Hne 


pT 

U 1 


63 (7.8) 


47(7.1) 


A7 
Ml 






XT 


; 2(0.2) 


0 








A A 

AA 


658 (82.0) 


528 (80.4) 




SELE 


Ser128Arg 


AO 


137 (17.1) 


123 (18.7) 








OU 


7(0.9) 


6 (0.9) 








AA 
AM 


646 (80.2) 


530 (80.8)' 




SELP 


Trir715Pro 


AO 


150(18.6) 


124(18.9) 








CC 


9(1.1) 


2 (0.3) 








AA 
AA 


758 (95.9) 


625 (95.0) 




TFPl 


Val2D4Met . 


A(j 


32 (4.1) 


33 (5.0) 


MO 






GG 


0 


0 








AA 


801 (99.6) 


638 (99.8) 




THBD 


-33G/A 


AG 


3(0.4) 


1 (0.2) 


.63 






GG 


0 


0 . 








AA 


794 (98.6) 


652 (98.9) 




THBD 


Ala25Trir 


AG 


11.(1.4) 


7(1.1). 


.64 






GG 


0 


0 








CC 


531 (67.0) 


433 (65.9) 




THBD 


Ala455Val 


CT . 


237 (29.9) 


203(3.9) 


.91 






TT 


24(3.0) 


21 (3.2) 











No. (%) 


2-Tailed 
P Value 




Variant 




- Cases 


Controls 






AA 


614 (76.3) 


507 (77.2) 




/ nDO i - 


rvsi 1 1 UUOW 


AG 


177 (22.0) 


136(20.7) 


.74 






uo 


14(1.7) 


14(2.1) 








00 


74 (9.4) 


33 (5.0) 




T7-VPC9 


o u i n 1/04. 


O I 


250 (31 .6) 


251 (38.4) 


.001 






TT 
I 1 


466 (59.0) 


370 (56.6) 








r.r 


-49(6.1) 


40(6.1) 




77-/QCM 


AJaoO/ rTO 




268 (33.4) 


229 (34.8) 


.00 








486(60.5) 


389 (59.1) 








AA 
AA 


187 (23.3) 


159(24.2) 




irfrXJ 


AO' loo 


Ap 
AVJ . 


374(46.6) 


324 (49.4) 








00 


241 (30.0) 


173 (26.4) 








AA 


702 (88.7) 


579(88.4) 




TLR4 


Gly299Asp 


AG 


88(11.1) 


76(11.6) 


.89 






GG 


1(0.1) 


0 








AA 


17(2.1) 


14(2.2) 




TNF 


-308G/A 


AG 


189 (23.5) 


176(27.2) 


.27. 






GG 


597 (74.3) 


457 (70.6) 








AA 


784 (97.0) 


627 (95.4) 




TNFRSF1A 


Arg92G!n 


AG 


24 (3.0) 


30(4.6) 


.13 






GG 


0 


0 





*Hardy-Weinberg equilibrium deviation in controls, P<.05 (n = 3). 
|P<.001(n = 1). 

tHardy-Weinberg equilibrium deviation in cases, P<.05 (n = 5). 
§P<.001(n = 2). 



(continued) 



Male and female cases were signifi- 
cantly more likely to be current smok- 
ers and to have type 2 diabetes mellitus 
but less likely to consume at least 1 al- 
coholic drink per month. Frequencies of 
hypercholesterolemia and hyperten- 
sion were higher in female cases than in 
controls; no significant differences were 
observed in males. Previous revascular- 
ization had been performed in 35.6% of 
incident ACS cases and in none of the 
controls. 

A total of 85 variants in 70 genes were 
genotyped in cases and controls.The 
overall genotype call rate for these vari- 
ants was 98.5% (range, 95.0%-99.8%). 
Two percent of all samples were geno- 
typed in duplicate for each marker in a 
blinded fashion as a measure of geno- 
type reproducibility. Among the 2511 
repeated genotypes, 5 were discordant, 
demonstrating a reproducibility of 99.8%. 



Tests of Hardy-Weinberg equilib- . 
rium revealed that 1 variant violated it 
in both cases and controls, at the P<.05 
level; 7 violated it in cases only; and 4 
violated it in controls only (Table 1 and 
Table 2). This finding is not more than 
expected by chance (4 violations ex- 
pected by chance in each group; see the 
Methods section) and therefore none 
was excluded from further analysis at 
this stage. 

With respect to power parameters, the 
mean effective frequency (or 1-fre- 
quency, if q >0.5) in controls of the pu- 
tative risk variants studied was 0.20, and 
58 (68.2%) were common, (>0.1), 25 
(29.4%) were uncommon (<0.1; 
>0.01), and 2 (2.4%) were rare (<0.01). 
Our sample had 80% power to confirm, 
by the Monte Carlo x 2 test, a genotype- 
specific relative risk of 2.3 for a rare vari- 
ant (q=0.01), 1.4 for a relatively uncom- 



mon variant (q = 0.1), and 1.25 for a 
common allele (q = 0. 5) . 

We tested whether each putative risk 
variant showed a significant difference 
in frequency between cases and con- 
trols (Table 1) . An odds ratio greater than 
1 indicates that the risk genotype was in 
higher frequency among cases, and if so, 
the genotype frequency difference was 
reported as a positive decimal number. 
Only 1 genetic variant was significant at 
the P<.05 level, which is the number 
most likely by chance alone. The -455 
variant, which lies upstream of the tran- 
scription initiation site in the p-fibrino- 
gen gene, replicated the originally 
reported association, with the GG geno- 
type being more frequent in cases than 
controls (frequency, 66% in cases vs 61% 
in controls; odds ratio, 1.27; P=.03). In 
addition, we found the MEF2A 21-bp 
deletion in 1 case and 1 control, con- 



©2007 American Medical Association. All rights reserved. 



(Reprinted) JAMA, April 11, 2007— Vol 297, No.' 14 1557 



Downloaded from www.jama.com by KellyPucci, on April 1 1, 2007 



GENETIC RISK FACTORS FOR ACUTE CORONARY SYNDROMES 



Table 3- Characte ristics of 1461 White Participants Cenotyped for 85 Genetic Variants* 
■ ■ '■ Men Women 

(n = 944) (n = 517) 



Characteristics 


i 

ACS Cases 
(n = 550) 


l 

Controls 
(n = 394) 


I 

ACS Cases 
(n = 261) 


-| 

' Controls 
(n = 256) 


Age, mean (SD), y 


6.7 (12.5) 


6.0 (12.1) 


63.1 (13.2) 


61.8 (12.8) 


Body mass index, mean (SD)t 


29.1 (5.5) 


27.9 (5.0) 


29.9 (6.9) 


27.7 (6.9) 
, — - 


Family history. of CAD/MI 


279 (50.7)* 


109(27.7) 


135 (51.7)* 


90 (35.5) 


Prior myocardial infarction 


142 (25.8)* 


0 


74 (28.4)* 


0 


Prior revascularization 


205(37.3)* 


0 


83 (31.8)* 


0 


Congestive heart failure 


23 (4.2)* 


0 


18(6.9)* 


0 


Hypertension 


305 (55.5) 


207 (52.5) 


182 (69.7)* 


126(49.2) 


Type 2 diabetes mellitus 


116(21.1)* 


42 (10.7) 


77 (29.5)* 


35 (13.7) 


Hypercholesterolemia 


314(57.1) 


208 (52.8) 


162 (62.1)* 


117(45.7) 


Postmenopausal . 






189 (68.6)* 


219 (85.5) 


College graduate 


166(30.2)* 


238(60.4) 


40 (15.3)* 


72 (28.1) 


Smoking <30 d ago 


183(33.3)* 


55 (14.0) 


85 (32.6)* 


31(12.1) 


Alcohol frequency >1/mo 


221 (40.2)* 


210(53.3) 


38 (14.6)* 


84 (32.8) 



Abbreviations: a<^o. acme curur idiy syuuiumw, w/-^, —1 - v 

♦Data are presented as number (percentage) unless otherwise indicated. 

tBody mass index is calculated as weight in kilograms divided by height in meters squared. 

*P<.001 for the comparison with controls of the same sex. 



firming that this is a rare variant in the 
population. 105 

Several supplementary analyses were 
performed. When the genotypes of cases 
and controls were analyzed by exten- 
sion of 2 X 3 x 2 tests to 100 000 simula- 
tions, 4 loci, RECQL2, THBS2, UPC, and 
p22-PHOX, were marginally significant 
(Table 2). In each case, the specific ge- 
netic risk model providing significance 
was different from that reported in the 
literature; hence, these cannot be con- 
sidered formal replications and the total 
number of positive associations is not in 
excess of random expectations. 

Finally, we found that only 41 of 84 
predefined risk variants were even mar- 
ginally more frequent in cases than in 
controls (excluding 1 tie, the rare 
MEF2A deletion), representing a 48.8% 
win rate (95% confidence interval, 
38.1%-59.5%) for the collective-risk 
genotypes. This observed proportion of 
wins is not different from the ex- 
pected proportion (50%) under the null 
hypothesis (P= .91). Table 1 shows that 
the absolute differences in risk geno- 
type frequencies between cases and con- 
trols (negative signs meaning that the 
putative risk genotype was more fre- 
quent in controls than in cases) were 
small, with a median difference of 



-0.0003, and maximum of 0.056 (p fi- 
brinogen). 

COMMENT 

We were unable to confirm as risk fac- 
tors for ACS 85 genetic variants be- 
cause none was unequivocally vali- 
dated in this large case-control study 
of 1461 participants. In the primary 
analysis, only the -455 promoter vari- 
ant in P-fibrinogen) was nominally sta- 
tistically significant (P= .03). Among the 
4 variants in the secondary analysis that 
met nominal statistical thresholds, there 
was an excess of a different variant than 
was previously reported among cases 
in the original study, which does not 
support replication. We therefore con- 
clude that our findings, in this large 
sample of well-characterized ACS pa- 
tients and controls, cannot support that 
this panel of gene variants contains 
bona fide ACS risk factors. 

Our findings come at a critical junc- 
ture in complex disease genetics. Some 
cardiovascular gene variants (eg, ACE, 
AGT, AGTR1, 1TGB3, F2, F5, MTHFR) 
included in our study can already be or- 
dered clinically, for indications that ex- 
plicitly include possible ACS risk. How- 
ever, our findings suggest that such 



underscore the importance of robust rep- 
lication studies of reported associations 
prior to their application to clinical care. 

These nonreplications include vari- 
ants in several high-profile studies. For 
example, haplotypes A and B of 5-li- 
poxygenase activating protein 
(AWX5AP) were reported in 1 study 
to be associated with Ml in the general 
populations of Iceland, and the United 
Kingdom, respectively. 17 We found nei- 
ther haplotype was associated with ACS, 
in spite of our observed haplotype fre- 
quencies in cases and controls closely 
approximating those found in the total 
United Kingdom data set (cases and 
controls) previously (haplotype A, 
0.165 vs 0.160, respectively; haplo- 
type B, 0.062 vs 0.058). 

Although our study raises signifi- 
cant doubts about the collective panel 
of putative genetic risk factors, it does 
not invalidate any particular previous 
study. Possible explanations of our 
negative results could include: (1) false- 
negative results in our study; (2) false- 
positive associations in previous stud- 
ies; and (3) varied effects of risk variants 
in different genetic backgrounds. 

False-negative results as a general ex- 
planation for our study's null findings 
are unlikely given that our sample size 
is substantially larger than all but a few 
reported prior studies and was pow- 
ered to detect modest relative risks. 
Based on a random sample (n=30) of 
articles included in this study (1 per 
gene variant), we estimated that the 
mean odds ratio reported in positive 
studies was 2.3 (range, 1.25-5.0), in- 
dicating that we had well in excess of 
80% power to replicate most reports. 
However, isolated positive reports may 
overestimate genetic risks. 3,6 Re- 
cently, a meta-analysis of 14 genes in- 
cluded in our study reported odds ra- 
tios ranging from 1 .10 to 1.-73 for risk 
of Ml. 3 It is possible that minute odds 
ratios are to be expected in complex dis- 
ease genetics and that neither our study 
nor most previous studies were suffi- 
ciently powered. Accordingly, we aug- 
mented our power, by use of the Sign 
test, to detect a surplus of as few as 16 
weakly positive genetic risk factors 



1558 JAMA, April 11, 2007— Vol 297, No. 14 (Reprinted) 



clinical genetic testing is premature and 

©2007 American Medical Association. All rights reserved. 



Downloaded from www.jama.com by KellyPucci, on April 11, 2007 



GENETIC RISK FACTORS FOR ACUTE CORONARY SYNDROMES 



among the entire set that we geno- 
typed (84 -16 = 50, the number re- 
quired for a significant Sign test), cor- 
responding to a mean odds ratio of 1 .05 
or higher given our sample size and the 
average risk genotype frequency. 

Absence of genetic effect only in our 
cohort is also unlikely. Cases showed 
a 2-fold higher family history of ACS, 
consistent with a genetic effect con- 
tributing to phenotypes in this cohort. 
In addition, homozygosity coding for 
an arginine residue at position 158 of 
apolipoprotein E (E4 variant), consid- 
ered 1 of the least controversial of the 
putative ACS susceptibility factors de- 
spite some inconsistency in certain co- 
horts, 106 was significantly associated 
(P= .04) among cases with hyperlipid- 
emia (4.1%) vs controls without hy- 
perlipidemia (1.6%). 

False-positive results in previous 
studies are another potential explana- 
tion for the discrepancy between our 
findings and those of others. This is- 
sue has previously been recognized as 
a serious problem with association stud- 
ies, particularly when sample sizes are 
underpowered. 107 It is difficult to iden- 
tify true vs false positives by analysis of 
the literature alone. 108 Unrecognized 
stratification between cases and con- 
trols can create spurious associa- 
tions, 109 and the absence of negative ge- 
nomic controls in. nearly all prior 
studies to exclude this possibility leaves 
this an open question. Also difficult to 
assess is the extent to which publica- 
tion bias and multiple hypothesis test- 
ing have had an effect. 

It could be argued that our research 
participants are distinct from those re- 
ported previously and that our results 
may not bear on the validity of posi- 
tive associations reported in different 
populations and clinical subgroups (eg, 
analyses substratified by age, sex, or a 
clinical variable, such as hyperten- 
sion, hyperlipidemia, or smoking sta- 
tus). Given that the vast majority of 
common variants in the human ge- 
nome date to our shared ancestry in 
Africa, 110 it is not likely that there are 
different common functional variants 
in linkage disequilibrium with risk vari- 



ants in our population vs others. Less 
common mutations of more recent an- 
cestral origin could conceivably be cor- 
related with certain genetic variants in 
one population but not another. The ex- 
tent to which linkage disequilibrium 
patterns might explain our findings is 
unknown, but our study populauon is 
quite typical of the mixed European 
background that is prevalent in the 
United States. 

Another possibility is that the effect 
of risk variants is different in different 
genetic backgrounds; if true, the lack 
of generalizability of results will se- 
verely limit their application to the clini- 
cal arena. The fact that we failed to rep- 
licate positive associations in a 
consecutive series of study partici- 
pants that are broadly representative of 
the disease encountered in clinical prac- 
tice places limitations on the potential 
applicability of prior findings and sup- 
ports our premise that it is premature 
to extrapolate these earlier findings to 
routine clinical care. 

The failure of the candidate gene ap- 
proach to identify variants conferring 
susceptibility to ACS risk prompts con- 
sideration of other approaches. One 
promising approach is to screen the en- 
tire genome in an unbiased way in a 
large sample for variants that are sig- 
nificantly associated with disease risk. 
Coupled with the understanding of un- 
derlying patterns of linkage disequilib- 
rium in the human genome 7 and the 
ability to inexpensively obtain geno- 
types across the genome, the field is 
moving rapidly toward a comprehen- 
sive genome-wide approach. Chal- 
lenges of this approach include the un- 
known number of variants that impart 
effect, the magnitude of the effect im- 
parted by each, and the extent to which 
common variants as opposed to rare in- 
dependent mutations account for dis- 
ease risk. 

Regardless of the approach taken, it 
is clear that multiple large, well- 
matched cohorts of cases and controls 
will be required to achieve valid 
progress in the genetic analysis of ACS 
and other complex human diseases. 
Our null findings indicate the need for 



caution in the interpretation of ge- 
netic associations in different clinical 
populations and the need for exten- 
sive validation of genetic risk factors. 

Author Contributions: Dr Morgan had full access to 
all the data in the study and takes responsibility for 
the integrity of the data and the accuracy of the data 
analysis. 

Study concept and design: Morgan, Lifton, Krumholz. 
Spertus. 

Acquisition of data: Morgan, Lifton, Krumholz, Spertus. 
Analysis and interpretation of data: Morgan, Lifton, 
Krumholz, Spertus. 

Drafting of the manuscript Morgan, Lifton, Krumholz, 
Spertus. 

Critical revision of the manuscript for important in- 
tellectual content Morgan, Lifton, Krumholz. 
Statistical analysis: Morgan, Lifton. 
Obtained funding: Morgan. Lifton, Spertus. 
Administrative, technical, or material support Lifton, 
Spertus. 

Study supervision: Lifton, Krumholz, Spertus. 
Financial Disclosures: Dr Spertus reports that he serves 
on the advisory boards of the American College of Car- 
diology, American Heart Association, Amgen United 
Healthcare, Blue Cross/Blue Shield; has received gants 
from the National Institutes of Health (NIH), Amgen, 
CV Therapeutics, Flowcardia. and Roache Diagnos- 
tics (in-kind biomarker reagent supplies for an NIH 
grant); has ownership interests in the Seattle Angina 
Questionnaire, the Kansas City Cardiomyopathy Ques- 
tionnaire, the Peripheral Artery Questionnaire, and 
Health Outcomes Sciences; and has consulted within 
the past 5 years for CV Therapeutics, Amgen, World - 
heart, and Ostuka Parmaceuticals. Dr Krumholz re- 
ports that he has research contracts with the Colo- 
rado Foundation for Medical Care and the American 
College of Cardiology, serves on the advisory boards 
for Amgen, Alere, and United Healthcare, is a subject- 
matter expert for VHA Inc. Drs Morgan and Lifton re- 
port no conflicts of interest 
Funding/Support This project was funded by grants 
from the Saint Luke's Hospital Foundation, Kansas City, 
Mo, and by grant R-01 HS1 1282-01 from the Agency 
for Healthcare Research and Quality. Dr Morgan's re- 
search in Dr Lifton's laboratory at Yale University was 
supported by Howard Hughes Medical Institute and 
by grant NHLBI K23 HI77272, a mentored patient- 
oriented research grant from the National Heart, Lung, 
and Blood Institute. 

Role of the Sponsor None of the funding organiza- 
tions had any role in the design and conduct of the 
study; collection, management analysis, and inter- 
pretation of the data; and preparation, review, or ap- 
proval of the manuscript 

Acknowledgment: We thank Donna Buchanan, PhD, 
Mid- America Heart Institute, Kansas City, Mo.for edi- 
torial assistance with the manuscript as part of her du- 
ties and received no additional compensation. 



REFERENCES 

1. Marenberg ME, Risch N, Berkman LF, Floderus B, 
de Faire U. Genetic susceptibility to death from coro- 
nary heart disease in a study of twins. N Engl J Med. 
1994;330:1041-1046. 

2. Scheuner MT. Clinical application of genetic risk 
assessment strategies for coronary artery disease: geno- 
types, phenotypes, and family history. Prim Care. 2004; 
31:711-737, xi-xii. 

3. Casas JPCJ, Miller GJ. Hingorani AD, Humphries 
SE. Investigating the genetic determinants of cardio- 
vascular disease using candidate genes and meta- 
analysis of association studies. Ann Hum Cenet. 2006; 
70:145-169. 

4. Morgan TM, Coffey CS, Krumholz HM. Overes- 



©2007 American Medical Association. All rights reserved. 



(Reprinted) JAMA, April 11, 2007— Vol 297, No. 14 1559 



Downloaded from www.jama.com by KellyPucci, on April 1 1, 2007 



GENETIC RISK FACTORS FOR ACUTE CORONARY SYNDROMES 



timation of genetic risks owing to small sample sizes 
in cardiovascular studies. Clin Genet. 2003;6 
4:7-17. 

5. Yamada Y. Identification of genetic factors and de- 
velopment of genetic risk diagnosis systems for car- 
diovascular diseases and stroke. Ore J. 2006;70:1240- 
1248. 

6. loannidis JP ( Ntzani EE, TrikalinosTA, Contopouios- 
loannidis DC. Replication validity of genetic associa- 
tion studies. Nat Genet. 2001 ;29:306-309. 

7. The International HapMap Consortium. The Inter- 
national HapMap Project. Nature. 2003;426:789-796. 

8. Lutucuta S, Ballantyne CM, Elghannam H, Gotto 
AM Jr, Marian AJ. Novel polymorphisms in promoter 
region of ATP binding cassette transporter gene and 
plasma lipids, severity, progression, and regression of 
coronary atherosclerosis and response to therapy. Ore 
Res. 2001;88:969-973. 

9. Zwarts KY, Clee SM, Zwinderman AH, et a). ABCA1 
regulatory variants influence coronary artery disease 
independent of effects on plasma lipid levels. Clin 
Genet 2002;61:115-125. 

10. Clee SM, Zwinderman AH. Engert JC. et al. Com- 
mon genetic variation in ABCA1 is associated with al- 
tered lipoprotein levels and a modified risk for coro- 
nary artery disease. Circulation. 2001 ; 103:1 198-1205. 

11. Tregouet DA, Ricard S; Nicaud V, et al. In-depth 
haplotype analysis of ABCA1 gene polymorphisms in 
relation to plasma ApoA1 levels and myocardial 
infarction. Arterioscler Thromb Vase Biol. 2004;24:775- 
781. 

12. Tobin MD, Braund PS, Burton PR, et at. Geno- 
types and haplotypes predisposing to myocardial in- 
farction: a multilocus case-control study. Eur Heart J. 
2004;25:459-467. 

13. Zee RY, Cook NR, Reynolds R, Cheng S, Ridker 
PM. Haplotype analysis of the beta2 adrenergic re- 
ceptor gene and risk of myocardial infarction in humans. 
Genetics. 2005;169:1583-1587. 

14. Higashi K, IshikawaT, Ito T, Yonemura A, Shige 
H t Nakamura H. Association of a genetic variation in 
the beta 3-adrenergic receptor gene with coronary 
heart disease among Japanese. Biochem Biophys Res 
Commun. 1997;232:728-730. 

15. Sethi AA, Nordestgaard BG, Tybjaerg- Hansen A. 
Angiotensinogen gene polymorphism, plasma angio- 
tensinogen, and risk of hypertension and ischemic heart 
disease: a meta-analysis. Arterioscler Thromb Vase Biol. 
2003;23:1269-1275. 

16. Fatini C, Abbate R, Pepe G, et al. Searching for a 
better assessment of the individual coronary risk pro- 
file: the role of angiotensin-converting enzyme, an- 
giotensin II type 1 receptor and angiotensinogen gene 
polymorphisms. Bur Heart). 2000;21:633-638. 

17. Helgadottir A, Manolescu A, Thorleifsson G. et al. 
The gene encoding 5 -lipoxygenase activating pro- 
tein confers risk of myocardial infarction and stroke. 
Nat Genet. 2004;36:233-239. 

18. Wang XL, Liu SX, McCredie RM, Wilcken DE. Poly- 
morphisms at the 5' -end of the apoiipoprotein Al gene 
and severity of coronary artery disease. J Clin Invest. 
1996;98:372-377. 

19. Reguero JR, Cubero Gl, Batalla A, et al. Apoiipo- 
protein A1 gene polymorphisms and risk of early coro- 
nary disease. Cardiology. 1 998;90:23 1-235. 

20. Wilson PW, Schaefer EJ, Larson MG, Ordovas 
JM. Apoiipoprotein E alleles and risk of coronary dis- 
ease: a meta-analysis. Arterioscler Thromb Vase Biol. 
1996;16:1250-1255. 

21. Lambert JC, Brousseau T, Defosse V, et al. Inde- 
pendent association of an APOE gene promoter poly- 
morphism with increased risk of myocardial infarc- 
tion and decreased APOE plasma concentrations-the 
ECTIM study. Hum Mot Genet. 2000;9:57-61. 

22. Aoki S, Mukae S, Itoh S, et al. The genetic factor 
in acute myocardial infarction with hypertension. Jpn 
Ore J. 2001;65:621-626. 

23. Zee RY, Cook NR, Cheng S, et al. Threonine for 



alanine substitution in the eotaxin (CCL11) gene 
and the risk of incident myocardial infarction. 
Atherosclerosis. 2004;175:91-94. 

24. Ortlepp JR, Vesper K, Mevissen V, et al. Chemo- 
kine receptor (CCR2) genotype is associated with myo- 
cardial infarction and heart failure in patients under 
65 years of age. J Mol Med. 2003 ;81 :363-367. 

25. Gonzalez P, Alvarez R, Batalla A, et al. Genetic 
variation at the chemokine receptors CCR5/CCR2 in 
myocardial infarction. Genes fmmun. 2001 ;2: 191 -195. 

26. Hubacek JA, Rothe G, Pit'ha J, et al. C(-260)-»T 
polymorphism in the promoter of the CD14 mono- 
cyte receptor gene as a risk factor for myocardial 
infarction. Circulation. 1999;99:3218-3220. 

27. Kuivenhoven JA, Jukema JW, Zwinderman AH, 
et al; the Regression Growth Evaluation Statin Study 
Group. The role of a common variant of the cho- 
lesteryl ester transfer protein gene in the progression 
of coronary atherosclerosis. N Engl J Med. 1998;338: 
86-93. 

28. Klerkx AH, Tanck MW, Kastelein JJ, et al. Hap- 
lotype analysis of the CETP gene: not TaqlB, but the 
closely linked -629C->A polymorphism and a novel 
promoter variant are independently associated with 
CETP concentration. Hum Mol Genet. 2003 ;1 2:1 11- 
123. 

29. Eriksson AL, Skrtic S, Niklason A, et al. Associa- 
tion between the low activity genotype of catechol- 
O-methyltransferase and myocardial infarction in a hy- 
pertensive population. Eur Heart J. 2004;25:386-391 . 

30. Niessner A, Marculescu R, Haschemi A, et al. Op- 
posite effects of CX3CR1 receptor polymorphisms 
V249I and T280M on the development of acute coro- 
nary syndrome: a possible implication of fractalkine 
in inflammatory activation. Thromb Haemost. 2005; 
93:949-954. 

31. McDermott DH, Halcox JP, Schenke WH, et al. 
Association between polymorphism in the chemo- 
kine receptor CX3CR1 and coronary vascular endo- 
thelial dysfunction and atherosclerosis. Ore Res. 2001 ; 
89:401-407. 

32. Patel S, Steeds R, Channer K, Samani NJ. Analy- 
sis of promoter region polymorphism in the 
aldosterone synthase gene (CYP11B2) as a risk fac- 
tor for myocardial infarction. Am J Hypertens. 2000; 
13:134-139. 

33. Hautanen A, Toivanen P, Manttari M, et al. Joint 
effects of an aldosterone synthase (CYP11B2) gene 
polymorphism and classic risk factors on risk of myo- 
cardial infarction. Circulation. 1 999; 100:221 3-221 & 

34. Yasar U, Ben net AM, Eliasson E, et al. Allelic van'-* 
ants of cytochromes P450 2C modify the risk for acute 
myocardial infarction. Pharmacogenetics. 2003;1 3:71 5- 
720. 

35. Funk M, Endler G, Freitag R, et al. CYP2C9*2 and 
CYP2C9*3 alleles confer a lower risk for myocardial 
infarction. Clin Chem. 2004;50:2395-2398. 

36. Endler G f Mannhalter C, Sunder- Plassmann H, et al. 
The K121Q polymorphism in the plasma cell mem- 
brane glycoprotein 1 gene predisposes to early myo- 
cardial infarction. J Mol Med. 2002;80:791 -795. 

37. Schuit SC. Oei HH, Witteman JC, et al. Estrogen 
receptor alpha gene polymorphisms and risk of myo- 
cardial infarction. JAMA. 2004;291:2969-2977. 

38. Shearman AM, Cupples LA, Demissie S, et al. As- 
sociation between estrogen receptor alpha gene varia- 
tion and cardiovascular disease. JAMA. 2003;290: 
2263-2270. 

39. Endler G, Mannhalter C. Sunder- Plassmann H, et al. 
Homozygosity for the C->T polymorphism at nucleo- 
tide 46 in the 5' untranslated region of the factor XII 
gene protects from development of acute coronary 
syndrome. Br J Haematol. 2001;115:1007-1009. 

40. Endler G, Mannhalter C. Polymorphisms in co- 
agulation factor genes and their impact on arterial and 
venous thrombosis. Clin Chim Acta. 2003;330:31 -55. 

41. Rosendaal FR, Siscovick DS, Schwartz SM, Psaty 
BM, RaghunathanTE, Vos HL. A common prothrom- 



bin variant (20210 G to A) increases the risk of 
myocardial infarction in young women. Blood. 1997; 
90:1747-1750. 

42. Girelli D, Russo C, Ferraresi P, et al. Polymor- 
phisms in the factor VII gene and the risk of myocar- 
dial infarction in patients with coronary artery disease. 
N Engl J Med. 2000;343:774-780. 

43. Boekholdt SM, Bijsterveld NR, Moons AH, Levi 
M, Buller HR, Peters RJ. Genetic variation in coagu- 
lation and fibrinolytic proteins and their relation with 
acute myocardial infarction: a systematic review. 
Circulation. 2001;104:3063-3068. 

44. Yamada Y, Izawa H, Ichinara S, et aJ. Prediction 
of the risk of myocardial infarction from polymor- 
phisms in candidate genes. N Engl J Med. 2002;347: 
1916-1923. 

45. Kenny D, Muckian C, Fitzgerald DJ. Cannon CP, 
Shields DC. Platelet glycoprotein lb alpha receptor poly- 
morphisms and recurrent ischaemic events in acute 
coronary syndrome patients. J Thromb Thrombolysis. 
2002;13:13-19. 

46. Douglas H, Michaelides K, Gorog DA, et aJ. Plate- 
let membrane glycoprotein Ibalpha gene -5T/C Kozak 
sequence polymorphism as an independent risk fac- 
tor for the occurrence of coronary thrombosis. Heart. 
2002;87:70-74. 

47. Lin RC, Wang XL, Morris BJ. Association of coro- 
nary artery disease with glucocorticoid receptor N363S 
variant. Hypertension. 2003;41 :404-407. 

48. Hetet G, Elbaz A, Gariepy J, et al. Association stud- 
ies between haemochromatosis gene mutations and 
the risk of cardiovascular diseases. Eur J Clin Invest. 
2001;31:382-388. 

49. YamadaS. Akita H, Kanazawa K, eta). T102C poly- 
morphism of the serotonin (5-HT) 2A receptor gene 
in patients with non-fatal acute myocardial infarction. 
Atherosclerosis. 2000; 1 50: 1 43- 1 48. 

50. Jiang H, Klein RM, Niederacher D, et al. C/T poly- 
morphism of the intercellular adhesion molecule-1 gene 
(exon 6, codon 469): a risk factor for coronary heart 
disease and myocardial infarction. Int J Cardiol. 2002; 
84:171-177. 

51. Momiyama Y, Hirano R, Taniguchi H, Naka- 
mura H, Ohsuzu F. Effects of interleukin-1 gene poly- 
morphisms on the development of coronary artery dis- 
ease associated with Chlamydia pneumoniae infection. 
J Am Coll Cardiol. 2001 ;38:71 2-71 7. 

52. Georges JL, Loukaci V, PoirierO, et al; Etude Cas- 
Temoin de I'lnfarctus du Myocarde. Interieukin-6gene 
polymorphisms and susceptibility to myocardial in- 
farction: the ECTIM study. J Mol Med. 2001 ;79:300- 
305. 

53. Jenny NS, Tracy RP, Ogg MS, et al. In the el- 
derly, interleukin-6 plasma levels and the -174G>C 
polymorphism are associated with the development 
of cardiovascular disease. Arterioscler Thromb Vase 
Biol. 2002;22:2066-2071. 

54. Baroni MG, D' Andrea MP, Montali A, et al. A com- 
mon mutation of the insulin receptor substrate-1 gene 
is a risk factor for coronary artery disease. Arterio- 
scler Thromb Vase Biol. 1999;19:2975-2980. 

55. Santoso S, Kunicki TJ, Kroll H, Haberbosch W, Gar- 
demann A. Association of the platelet glycoprotein la 
C807T gene polymorphism with nonfatal myocardial 
infarction in younger patients. Blood. 1999;93:2449- 
2453. 

56. Samara WM, Gurbel PA. The role of platelet re- 
ceptors and adhesion molecules in coronary artery 
disease. Coron Artery Dis. 2003;14:65-79. 

57. Zambon A, Deeb SS, Pauletto P, Crepaldi G, Brun- 
zell JD. Hepatic lipase: a marker for cardiovascular dis- 
ease risk and response to therapy. Curr Opin Lipidol. 
2003;14:179-189. 

58. Ji J, Herbison CE, Mamotte CD, Burke V, Taylor 
RR, van Bockxmeer FM. Hepatic lipase gene -51 4 C/T 
polymorphism and premature coronary heart disease. 
J Cardiovasc Risk. 2002;9:105-113. 

59. Hokanson JE. Functional variants in the lipopro- 



1560 JAMA, April 11, 2007— Vol 297, No. 14 (Reprinted) 



©2007 American Medical Association. Ail rights reserved. 



Downloaded from www.jama.com by KellyPucci, on April 11, 2007 



GENETIC RISK FACTORS FOR ACUTE CORONARY SYNDROMES 



tein lipase gene and risk cardiovascular disease. Curr 
Opin Lipidol. 1999;10:393-399. 

60. Schulz S, Schagdarsurengin U, Creiser P, et a!. The 
LDL receptor-related protein (LRP1 /A2MR) and coro- 
nary atherosclerosis-novel genomic variants and func- 
tional consequences. Hum Mutat 2002;20:404. 

61. PROCARDIS Consortium. A trio family study 
showing association of the lymphotoxin-alpha N26 
(804A) allele with coronary artery disease. Eur J Hum 
Genet 2004;12:770-774. 

62. Ozaki K, Ohnishi Y, lida A, et al. Functional SNPs 
in the lymphotoxin-alpha gene that are associated with 
susceptibility to myocardial infarction. Nat Genet 2002; 
32:650-654. 

63. Herrmann SM, Whatling C, Brand E, etal. Poly- 
morphisms of the human matrix gla protein (MCP) 
gene, vascular calcification, and myocardial infarction. 
Arterioscler Thromb Vase Biol. 2000;20:2386-2393. 

64. Humphries SE f Martin S, Cooper J, Miller G: In- 
teraction between smoking and the stromelysin-1 
(MMP3) gene 5A/6A promoter polymorphism and risk 
of coronary heart disease in healthy men. Ann Hum 
Genet. 2002;66:343-352. • 

65. Lamblin N. Bauters C, Hermant X, Lablanche JM, 
Helbecque N, Amouyel P. Polymorphisms in the pro- 
moter regions of MMP-2, MMP-3, MMP-9 and 
MMP-12 genes as determinants of aneurysmal coro- 
nary artery disease. J Am Coll Cardiol. 2002;40:43-48. 

66. Klerk M, Verhoef P, Clarke R, Blom HJ, Kok FJ, 
Schouten EG. MTHFR 677C-*T polymorphism and risk 
of coronary heart disease: a meta-analysis. JAMA. 
2002;288:2023-2031. 

67. Ledmyr H, McMahon AD, Ehrenborg E, et al. The 
microsomal triglyceride transfer protein gene-493T vari- 
ant lowers cholesterol but increases the risk of coro- 
nary heart disease. Circulation. 2004;109:2279-2284. 

68. Juo SH, Han Z, Smith JD, Colangelo L, Liu K. Com- 
mon polymorphism in promoter of microsomal tri- 
glyceride transfer protein gene influences choles : 
terol. ApoB, and triglyceride levels in young African 
American men: results from the coronary artery risk 
development in young adults (CARDIA) study. Arte- 
rioscler Thromb Vase Biol. 2000;20:1316-1322.' 

69. Hyndman ME, Bridge PJ, Wamica JW, Fick G, Par- 
sons HG. Effect of heterozygosity for the methionine 
synthase 2756 A-»G mutation on the risk for recur- 
rent cardiovascular events. Am J Cardiol. 2000;86: 
1144-1146. A1 149. 

70. Gruchala M, Ciecwierz D, Wasag B, et al. Asso- 
ciation of the Seal atrial natriuretic peptide gene poly- 
morphism with nonfatal myocardial infarction and ex- 
tent of coronary artery disease. Am Heart J. 2003; 145: 
125-131. 

71. Tatsuguchi M, Furutani M, Hinagata J, etal. Oxi- 
dized LDL receptor gene {OLR1) is associated with the 
risk of myocardial infarction. Biochem Biophys Res 
Commun. 2003;303:247-250. 

72. Gardemann A) Mages P, Katz N, Tillmanns H, 
Haberbosch W. The p22 phox A640G gene polymor- 
phism but not the C242T gene variation is associated 
with coronary heart disease in younger individuals. 
A therosclerosis. 1 999; 145:315-323. 

73. Inoue N, Kawashima S, Kanazawa K, Yamada S, 
Akita H, Yokoyama M. Polymorphism of the NADH/ 
NADPH oxidase p22-phox gene in patients with coro- 
nary artery disease. Circulation. 1998;97:135-137. 

74. Wenzel K, Baumann G, Felix SB. The homozy-. 
gous combination of Leu1 25Val and Ser563Asn poly- 
morphisms in the PECAM1 (CD31) gene is associ- 
ated with early severe coronary heart disease. Hum 
Mutat 1999;14:545. 

75. Andreotti F, Porto I, Crea F, Maseri A. Inflamma- 



tory gene polymorphisms and ischaemic heart dis- 
ease: review of population association studies. Heart. 
2002;87:107-112, 

76. Durrington PN, Mackness B, Mackness Ml. Para- 
oxonase and atherosclerosis. Arterioscler Thromb Vase 
Biol. 2001;21:473-480. 

77. Sanghera DK, Aston CE, Saha N, Kamboh Ml. DNA 
polymorphisms in two paraoxonase genes (PO/V7 and 
PON2) are associated with the risk of coronary heart 
disease. Am J Hum Genet 1998;62:36-44. 

78. Ridker PM, Cook NR, Cheng S, et al. Alanine for 
proline substitution in the peroxisome proliferator- 
activated receptor gamma-2 (PPARG2) gene and the 
risk of incident myocardial infarction. Arterioscler 
Thromb Vase Biol. 2003;23:859-863. 

79. Cipollone F, Toniato E, Martinotti S, et al. A poly- 
morphism in the cyclooxygenase 2 gene as an inher- 
ited protective factor against myocardial infarction and 
stroke. JAMA. 2004;291:2221-2228. 

80. Ye L. Miki T, Nakura J, etal. Association of a poly- 
morphic variant of the Werner helicase gene with myo- 
cardial infarction in a Japanese population. Am J Med 
Genet 1997;68:494-498. 

81. Herrmann SM, Ricard S, Nicaud V, et al. The P- 
selectin gene is highly polymorphic: reduced fre- 
quency of the Pro715 allele carriers in patients with 
myocardial infarction. Hum Mol Genet. 1998;7:1277- 
1284. 

82. Moatti D. Seknadji P, Galand C, et al. Polymor- 
phisms of the tissue factor pathway inhibitor (TFPI) 
gene in patients with acute coronary syndromes and 
in healthy subjects: impact of the V264M substitu- 
tion on plasma levels of TFPI. Arterioscler Thromb Vase 
Biol. 1999;19:862-869. 

83. Chao TH, Li YH, Chen JH, et al. Relation of throm- 
bomodulin gene polymorphisms to acute myocardial 
infarction in patients <or =50 years of age. Am J 
Cardiol. 2004;93:204-207. 

84. Doggen CJ, Kunz G, Rosendaal FR r et al. A mu- 
tation in the thrombomodulin gene, 127G to A cod- 
ing for Ala25Thr, and the risk of myocardial infarc- 
tion in men. Thromb Haemost 1998;80:743-748. 

85. Wu KK, Aleksic N, Ann C, Boerwinkle E, Folsom 
AR, Juneja H. Thrombomodulin Ala455Val polymor- 
phism and risk of coronary heart disease. Circulation. 
2001;103:1386-1389. 

86. Topol EJ, McCarthy J. Gabriel S, et al. Single nucleo- 
tide polymorphisms in multiple novel thrombospon- 
din genes may be associated with familial premature 
myocardial infarction. Circulation. 2001 ;104: 
2641-2644. 

87. Boekholdt SM, Trip MD, Peters RJ, et al. Throm- 
bospondin-2 polymorphism is associated with a re- 
duced risk of premature myocardial infarction. Arte- 
rioscler Thromb Vase Blot 2002;22:e24-e27. 

88. Webb KE, Martin JF, Hamsten A, et al. Polymor- 
phisms in the thrombopoietin gene are associated with 
risk of. myocardial infarction at a young age. 
Atherosclerosis. 2001;154:703-711. 

89. Kolek MJ, Carlquist JF, Muhlestein JB, et al. Toll- 
like receptor 4 gene Asp299Gly polymorphism is as- 
sociated with reductions in vascular inflammation, an- 
giographic coronary artery disease, and clinical diabetes. 
Am Heart J. 2004;148:1034-1040. 

90. Padovani JC, Pazin-Filho A, Simoes MV r Marin- 
Neto JA, Zago MA, Franco RF. Gene polymorphisms 
in the TNF locus and the risk of myocardial infarction. 
Thromb Res. 2000;100:263-269. 

91. Poirier O, Nicaud V, Gariepy J, et al. Polymor- 
phism R92Q of the tumour necrosis factor receptor 1 
gene is associated with myocardial infarction and ca- 
rotid intima-media thickness-the ECTIM, AXA, EVA 



and GENIC Studies. Eur J Hum Genet 20O4;12:213- 
219. 

92. Alpert JS, Thygesen K, Antman E, Bassand JP. Myo- 
cardial infarction redefined — a consensus document 
of the Joint European Society of Cardiology/ 
American College of Cardiology Committee for the 
redefinition of myocardial infarction. J Am Coll Cardiol. 
2000;36:959-969. 

93. Braunwald E. Unstable angina: a classification. 
Circulation. 1989;80:410-414. 

94. Yan J, Feng J, Hosono S, Sommer SS. Assess- 
ment of multiple displacement amplification in mo- 
lecular epidemiology. Biotechniques. 2004;37: 
136-138,140-133. 

95. Dean FB, Hosono S. Fang L, et al. Comprehen- 
sive human genome amplification using multiple dis- 
placement amplification. Proc Natl Acad Set USA. 
2002;99:5261-5266. 

96. Jurinke C, van den Boom D, Cantor CR. Koster 
H. The use of MassARRAY technology for high 
throughput genotyping. Adv Biochem Eng Biotechnol. 
2002;77:57-74. 

97. Jurinke C, Oeth P, van den Boom D. MALDI- 
TOF mass spectrometry: a versatile tool for high- 
performance DNA analysis. Mol Biotechnol. 2004;26: 
147-164. 

98. Chiodini BD, Bariera S, Franzosi MG, Beceiro VL, 
Introna M, Tognoni G. APO B gene polymorphisms 
and coronary artery disease: a meta-analysis. 
Atherosclerosis. 2003;167:355-366. 

99. Gonzalez-Conejero R, Corral J, Roldan V, et al. 
A common polymorphism in the annexin V Kozak se- 
quence MOT) increases translation efficiency and 
plasma levels of annexin V, and decreases the risk of 
myocardial infarction in young patients. Blood. 2002; 
100:2081-2086. 

100. Hines LM, StampferMJ, Ma J, etal. Genetic varia- 
tion in alcohol dehydrogenase and the beneficial effect 
of moderate alcohol consumption on myocardial 
infarction. N Engl J Med. 2001;344:549-555. 

101. Stephens M, Scheet P. Accounting for decay of 
linkage disequilibrium in haplotype inference and miss- 
ing-data imputation. Am J Hum Genet 2005:76:449- 
462. 

102. Stephens M, Smith NJ, Donnelly P. A new sta- 
tistical method for haplotype reconstruction from popu- 
lation data. Am J Hum Genet 2001;68:978-989. 

103. Gauderman WJ. Candidate gene association 
analysis for a quantitative trait, using parent- 
offspring trios. Genet Epidemiol. 2003;25:327-338. 

104. Gauderman WJ. Sample size requirements for 
matched case-control studies of gene-environment 
interaction. Stat Med. 2002;21:35-50. 

105. Weng L, Kavaslar N, Ustaszewska A, etal. Lack 
of MEF2A mutations in coronary artery disease. J Clin 
Invest. 2005;115:1016-1020. 

106. Liu S, Ma J. Ridker PM, Breslow JL. Stampfer 
MJ. A prospective study of the association between 
APOE genotype and the risk of myocardial infarction 
among apparently healthy men. Atherosclerosis. 2003; 
166:323-329. 

107. Freely associating. Nat Genet. 1999;22:1-2. 

108. Salanti G, Sanderson S, Higgins JP. Obstacles and 
opportunities in meta-analysis of genetic association 
studies. Genet Med. 2005;7:13-20. 

109. Marchini J, Cardon LR, Phillips MS, Donnelly P. 
The effects of human population structure on large 
genetic association studies. Nat Genet 2004;36:51 2- 
517. 

110. Gabriel SB, Schaffner SF, Nguyen H, et al. The 
structure of haplotype blocks in the human genome. 
Science. 2002;296:2225-2229. 



©2007 American Medical Association. All rights reserved. (Reprinted) JAMA, April 1 1 , 2007— Vol 297, No. 14 1 561 



Downloaded from www.jama.com by KellyPucci, on April 1 1, 2007 



Special Report 



Prediction of Coronary Heart Disease Using Risk 

Factor Categories 

Peter W.R Wilson, MD; Ralph B. D'Agostino, PhD; Daniel Levy, MD; Albert M. Belanger, BS; 

Halit Silbershatz, PhD; William B. Kannel, MD 

Background — The objective of this study was to examine the association of Joint National Committee (JNC-V) blood 
pressure and National Cholesterol Education Program (NCEP) cholesterol categories with coronary heart disease (CHD) 
risk, to incorporate them into coronary prediction algorithms, and to compare the discrimination properties of this 
approach with other noncategorical prediction functions. 

Methods and Results — This work was designed as a prospective, single-center study in the setting of a community-based 
cohort. The patients were 2489 men and 2856 women 30 to 74 years old at baseline with 12 years of follow-up. During 
the 12 years of follow-up, a total of 383 men and 227 women developed CHD, which was significantly associated with 
categories of blood pressure, total cholesterol, LDL cholesterol, and HDL cholesterol (all P<.001). Sex-specific 
prediction equations were formulated to predict CHD risk according to age, diabetes, smoking, JNC-V blood pressure 
categories, and NCEP total cholesterol and LDL cholesterol categories. The accuracy of this categorical approach was 
found to be comparable to CHD prediction when the continuous variables themselves were used. After adjustment for 
other factors, «*28% of CHD events in men and 29% in women were attributable to blood pressure levels that exceeded 
high normal (> 130/85). The corresponding multi variable-adjusted attributable risk percent associated with elevated 
total cholesterol (>200 mg/dL) was 27% in men and 34% in women. 

Conclusions — Recommended guidelines of blood pressure, total cholesterol, and LDL cholesterol effectively predict CHD 
risk in a middle-aged white population sample. A simple coronary disease prediction algorithm was developed using 
categorical variables, which allows physicians to predict multivariate CHD risk in patients without overt CHD. 
(Circulation. 1<>98;97: 1837-1847.) 

Key Words: coronary disease ■ prediction ■ hypertension ■ cholesterol 



Coronary heart disease continues to be a leading cause of 
morbidity and mortality among adults in Europe and 
North America. 1 Risk factors have included blood pressure, 
cigarette smoking, cholesterol (TC), LDL-C, HDL-C, and 
diabetes. 2 " 4 Factors such as obesity, left ventricular hypertro- 
phy, family history of premature CHD, and ERT have also 
been considered in defining CHD risk. 5 " 7 Data from popula- 
tion studies enabled prediction of CHD during a follow-up 
interval of several years, based on blood pressure, smoking 
history, TC and HDL-C levels, diabetes, and left ventricular 
hypertrophy on the ECG. These prediction algorithms have 
been adapted to simplified score sheets that allow physicians 
to estimate multivariable CHD risk in middle-aged patients. 8 

See p 1761 

The present article develops a simplified coronary predic- 
tion model, building on the blood pressure, cholesterol, and 
LDL-C categories proposed by the JNC-V and NCEP ATP 
jj 7.9.10 The ana iy S i s evaluates the utility and accuracy of blood 
pressure, cholesterol, and LDL-C recommended categories in 
multivariable CHD prediction, using a Framingham Heart 



Study sample that pooled information for the original and 
offspring cohorts and followed them for 12 years. This 
approach emphasizes the established, powerful, independent, 
and biologically important factors. Family history for heart 
disease, physical activity, and obesity are not included be- 
cause these factors work to a large extent through the major 
risk factors, and their unique contribution to CHD prediction 
can be difficult to quantify. The prediction of initial CHD 
events in a free-living population not on medication is 
emphasized. Consequently, ERT for postmenopausal women, 
treatment of high blood pressure, and therapy for high blood 
cholesterol are not included in the formulations. 

Methods 

The population-based sample used for this report included 2489 men 
and 2856 women 30 to 74 years old at the time of their Framingham 
Heart Study examination in 1971 to 1974. Participants attended 
either the 11th examination of the original Framingham cohort 11 or 
the initial examination of the Framingham Offspring Study. 12 Similar 
research protocols were used in each study, and persons with overt 
CHD at the baseline examination were excluded. 



From the Framingham Heart Study, National Heart, Lung, and Blood Institute, Framingham, Mass (P.W.F.W., D.L.); Boston University Mathematics 
Department, Boston, Mass (R.B.D., A.M.B., H.S.); and Framingham Heart Study, Boston University School of Medicine, Framingham, Mass (W.B.K.). 
Reprint requests to Dr Peter W.F. Wilson, Framingham Heart Study, National Heart, Lung, and Blood Institute, 5 Thurber St, Framingham, MA 01701. 
E-mail peter@fram.nhlbi.nih.gov Score' sheets are on the internet at http://w ww.nhlbi.nih.gov/nhlbi/fram/ 
© 1998 American Heart Association, Inc. 



Downloaded from circ.ahajoilrS47s.org by on October 16, 2007 



1838 Prediction of Coronary Heart Disease 



Selected Abbreviations and Acronyms 
CHD = coronary heart disease 
ERT = estrogen replacement therapy 
HDL-C = HDL cholesterol 

JNC-V = Fifth Joint National Committee on Hypertension 
LDL-C = LDL cholesterol 
NCEP ATP II = National Cholesterol Education Program, Adult 
Treatment Panel II 
TC = total cholesterol 
VLDL-C = VLDL cholesterol 



At the 1971-1974 examination, a medical history was taken and a 
physical examination was performed by a physician. Persons who 
smoked regularly during the previous 12 months were classified as 
smokers. Height and weight were measured, and body mass index 
(kg/m 2 ) was calculated. Two blood pressure determinations were made 
after the participant had been sitting at least 5 minutes, and the average 
was used for analyses. Hypertension was categorized according to blood 
pressure readings by JNC-V definitions 10 : optimal (systolic 
<120 mm Hg and diastolic <80 mmHg), normal blood pressure 
(systolic 120 to 129 mm Hg or diastolic 80 to 84 mm Hg), high normal 
blood pressure (systolic 130 to 139 mmHg or diastolic 85 to 
89 mmHg), hypertension stage I (systolic 140 to 159 mmHg or 
diastolic 90 to 99 mm Hg), and hypertension stage II-IV (systolic ^160 
or diastolic ^100 mm Hg). When systolic and diastolic pressures fell 
into different categories, the higher category was selected for the 
purposes of classification. Blood pressure categorization was made 
without regard to the use of antihypertensive medication. 

Diabetes was considered present if the participant was under treat- 
ment with insulin or oral hypoglycemic agents, if casual blood glucose 
determinations exceeded 150 mg/dL at two clinic visits in the original 
cohort, or if fasting blood glucose exceeded 140 mg/dL at the initial 
examination of the Offspring Study participants. Blood was drawn at the 
baseline examination after an overnight fast, and EDTA plasma was 
used for all cholesterol and triglyceride measurements. Cholesterol was 
determined according to the Abell-Kendall technique, 13 and HDL-C was 
measured after precipitation of VLDL and LDL proteins with heparin- 
magnesium according to the Lipid Research Clinics Program protocol. 14 
When triglycerides were <400 mg/dL, the concentration of LDL-C was 
estimated indirecUy by use of the Friedewald formula 15 ; for triglycerides 
>400 mg/dL, the LDL-C was estimated directly after ultracentrifuga- 
tion of plasma and measurement of cholesterol in the bottom fraction 
(plasma density <1.006). 16 

Cutoffs for TC (<200, 200 to 239, 240 to 279, and =>280 mg/dL), 
LDL-C (<130, 130 to 159, and >160 mg/dL), HDL-C (<35, 35 to 
59, and >60 mg/dL), cigarette smoking, diabetes, and age were 
considered in this report. The cholesterol and LDL-C cutoffs are 
similar to those used for the NCEP ATP II guidelines and were partly 
dictated by the number of persons with higher levels of TC or 
LDL-C. For those reasons, we have provided information for 
cholesterol categories of 240 to 279 and >280 mg/dL and for LDL-C 
>160 mg/dL. Too few persons had LDL-C ^190 mg/dL to provide 
stable estimates for CHD risk. Study subjects were followed up over 
a 12-year period for the development of CHD (angina pectoris, 



recognized and unrecognized myocardial infarction, coronary insuf- . 
ficiency, and coronary heart disease death) according to previously 
published criteria. "Hard CHD" events included total CHD without 
angina pectoris. 17 Surveillance for CHD consisted of regular exam- 
inations at the Framingham Heart Study clinic and review of medical 
records from outside physician office visits and hospitalizations. 

Statistical tests included age-adjusted linear regression or logistic 
regression to test for trends across blood pressure, TC, LDL-C, and 
HDL-C categories. 18 Age-adjusted Cox proportional hazards regres- 
sion and its accompanying c statistic were used to test for the relation 
between various independent variables and the CHD outcome and to 
evaluate the discriminatory ability of various prediction models. 19,20 
The 12-year follow-up was used in the proportional hazards models, 
and results were adapted to provide 10-year CHD incidence esti- 
mates. Separate score sheets were developed for each sex using TC 
and LDL-C categories. These sheets adapted the results of propor- 
tional hazards regressions by use of a system that assigned points for 
each risk factor based on the value for the corresponding /3-coeffi- 
cient of the regression analyses. 

The relative risk, but not the attributable risk, for TC and CHD 
declines with advancing age. 21 Quadratic terms for age were consid- 
ered in the models for the score sheets. Furthermore, CHD risk is 
associated with HDL-C in the elderly, 22 " 24 and interaction terms for 
TC and age were also considered in the development of the 
prediction models. 22 Among women, an age-squared term was found 
to be significant in the prediction models and was incorporated into 
the score sheets. Neither ageXTC nor ageX LDL-C was found to be 
significant in either sex. 

Score sheets for prediction of CHD using TC and LDL-C 
categorical variables were developed from the'0-coefficients of Cox 
proportional hazards models. The TC range was expanded in 
40-mg/dL increments to include >160 mg/dL and ^280 mg/dL, the 
HDL-C range 35 to 59 mg/dL was partitioned to provide three levels 
for each sex, and both optimal and normal blood pressure categories 
were included. The score sheets provide comparison 10-year abso- 
lute risks for persons of the same age and sex for average total CHD, 
average hard CHD (total CHD without angina pectoris), and low-risk 
total CHD. Risk factors are shaded, ranging from very low relative 
risk to very high. Such distinctions are arbitrary but provide a 
foundation to determine the need for clinical intervention. 

Results 

At initial examination, study subjects ranged in age from 30 to 
74 years, and the mean age±SD was 48.6 ± 1 1 .7 years for 2489 
men and 49.8± 12.0 years for 2856 women. Because there were 
relatively few persons at the higher stages of hypertension in the 
Framingham sample, stages n, IE, and IV hypertension were 
combined into a single category in the analyses (Table 1). 
Approximately half of the subjects for each sex had blood 
pressure levels in the normal or optimal range. 

The age-adjusted means for various risk factors according to 
blood pressure categories are shown for men and women in Table 
2. Therapy for hypertension (P<ffl\ men, P<.001 women), more 
frequent diabetes (P<.001 mea P<£0\ women), greater body 



TABLE 1. Characteristics of Participants According to JNC-V 
Hypertension Categories* 







Blood Pressure 






Systolic, mm Hg 


Diastolic, mm Hg 


Men, % 


Women, % 


Normal (including optimal) 


<130 


<85 


44 


55 


High normal 


130-139 


85-89 


20 


15 


Hypertension stage I 


140-159 


90-99 


23 


19 


Hypertension stage IMV 


>160 


>100 


13 


11 



ignoring blood pressure therapy. 

Downloaded from circ.ahajournals.org by on October 16, 2007 



Wilson etal May 12, 1998 1839 



TABLE 2. Age-Adjusted Mean Levels and Prevalence of Risk Factors According to Blood 
Pressure Category 





Not Hypertensive 


Hypertensive 


P, 

Test for Trend* 


Normal 


High Normal 


Stage I 


Stage IHV 


Men 


(n=1097) 


(n=500) 


(n=567) 


(n=325) 




Hypertensive therapy, % 


1.6 


2.7 


10.1 


25.0 


<.001 


Body mass index, kg/m 2 


25.8 


26.7 


27.5 


28.3 


<.001 


Cigarette use, % 


43.1 


41.8 


35.4 


38.2 


.010 


Diabetes, % 


o.u 


fi 1 

U. 1 


4.0 


11.2 


<.001 


TC, mg/dL 


210.1 


214.3 


218.0 


213.9 


.004 


LDL-C, mg/dL 


149 7 


143.4 


144.5 


139.7 


.638 


HDL-C, mg/dL 


44.4 


45.7 


44.8 


44.5 


.674 


Women 


(n=1578) 


(n=424) 


(n=535) 


(n=319) 




Hypertensive therapy, % 


3.9 


9.4 


18.0 


33.6 


<.001 


Body mass index, kg/m 2 


23.9 


25.8 


26.3 


26.9 


<.001 


Cigarette use, % 


39.4 


37.3 


33.9 


35.9 


.071 


Diabetes, % 


2.6 


3.4 


4.9 


9.8 


<.001 


TC, mg/dL 


214.1 


223.0 


224.4 


218.5 


<.001 


LDL-C, mg/dL 


138.3 


143.9 


146.8 


138.9 


.031 


HDL-C, mg/dL 


58.6 


58.2 


55.9 


55.7 


<.001 



Test for linear trend across blood pressure categories after age adjustment. For dichotomous variables, logistic regression was 
done. 



mass index (/><.001 men, P<.00\ women), and higher TC level 
(p=.004 men, P<Wl women) were consistently associated 
with higher blood pressure categories in both sexes. Cigarette 
smoking was inversely associated with blood pressure in men 
(/ > =.010), but only a borderline association was present in 
women (P=.071). The lipoprotein fractions HDL-C 
(P<.00\) and LDL-C (P=.031) were significantly associated 
with blood pressure category in women but not in men. 

Age-adjusted 10-year CHD rates for blood pressure and 
cholesterol categories are shown for men and women in Table 3. 
In prediction models, the CHD rates were significantly associ- 
ated with the specified categories of blood pressure, TC, 
HDL-C, and LDL-C (all P<Wl for both sexes). The number of 
CHD events arising at each blood pressure and cholesterol 
category is also given. For blood pressure, the greatest number 
of CHD cases arose from the stage I hypertension category for 
both sexes. Conversely, the greatest number of CHD cases arose 
from the highest lipoprotein cholesterol levels (LDL-C >160 
mg/dL or cholesterol >240 mg/dL). 

Multivariable risk calculations for TC categories are shown in 
Table 4. Normal or optimal blood pressure was used as the 
reference level, and estimated relative risk rose from 1 .00 for normal 
or optimal blood pressure to 1.84 in men and 2.12 in women with 
stage H-TV hypertension. Similarly, for TC, the estimated relative 
risk rose from 1 .00 for levels <200 mg/dL to 1 .90 in men and 1 .72 
in women with TC >240 mg/dL. When typical HDL-C levels (35 
to 59 mg/dL) were used as a reference, CHD risk was increased 
among men and women with low HDL-C (<35 mg/dL) and CHD 
risk was correspondingly decreased among subjects with high 
HDL-C (>60 mg/dL). The rxjpulation-attributable risk percent 
associated with hypertension was 6% for high normal, 13% for 
stage I, and 9% for stage D-iY hypertension among men. The 
corresponding values were 5% for high normal, 13% for stage I, 



and 12% for stage II-IV hypertension among women. An overall 
estimate of the attributable risk percent for blood pressure level 
greater than normal was 28% in men and 29% in women. When . 
cholesterol <200 mg/dL was used as the reference range, attribut- 
able risks were 10% for TC 200 to 239 mg/dL and 17% for TC 
>240 mg/dL in men and 12% for TC 200 to 239 mg/dL and 22% 
for TC >240 mg/dL in women. The overall estimate of the 
attributable risk percent for TC level >200 mg/dL was 27% in men 
and 34% in women. 

Multivariable risk calculations for LDL-C categories are 
shown in Table 5, and these results parallel the presentation in 
Table 4. When LDL-C <130 mg/dL is used as the reference 
range, a greater absolute CHD risk is associated with higher 
LDL-C categories, but the magnitude of the relative risk and 
its statistical significance are very similar to that observed for 
the categories of TC (Table 4). 

The efficacy of prediction with continuous variables was 
compared with that obtained with categorical variables and a risk 
factor sum (Figs 1 and 2 for men and women, respectively). For 
calculation of the risk factor sum, the levels considered were age 
(>45 years for men, ^55 years for women), hypertension 
(systolic blood pressure >140 mm Hg, diastolic blood pressure 
>90 mm Hg, or use of antihypertensive medication), smoking, 
diabetes, elevated cholesterol (cholesterol >240 mg/dL or 
LDL-C >160 mg/dL), and HDL-C <35 mg/dL. One point was 
given for each risk factor, for a possible score of 0 to 7 points. 
A greater area under the curve indicated better predictive 
capability. The curves were nearly identical for the continuous 
and categorical formulations, TC and LDL-C categories had 
similar effects, and the risk factor sums tended to have the lowest 
predictive potential. The c statistic, a measure of the discrimi- 
natory ability of a model, equal to the area under the receiver 
operating characteristic curve, provides a guide to interpret the 



Downloaded from circ.ahajournaIs.org by on October 1 6, 2007 



1840 Prediction of Coronary Heart Disease 



TABLE 3. CHD Risk According to Blood Pressure and Lipid Categories 







Men 






Women 




Person-Years 


NO. Ot 

Events (%) 


Age- Ad justed 
10-Year Rate 


Person-Years 


IMO. OT 

Events (%) 


Age-nQjusieo 
10-Year Rate 


Tntal 


30 154 






38057 


227 (100) 




biooa pressure 














Normal (inciuainy opurnaij 


\0 Dc.*f 


I I u ^tij; 


7 ft 


, 20 747 


66 (29) 


2.9 


Minh nnrmal 
niy 11 MUlllldl 


uou/ 


77 f?01 


12.4 


6056 


36 (16) 


7.1 




6695 


115 f30^ 


16.0 


7254 


72 (32) 


13.9 


Hypertension siage ihv 




ft1 /91 \ 




4000 


53 (23) 


14.1 


TP mn/HI 














-conn 


11 


1 UJ f } 


8.2 


13 289 


39(17) 


3.1 


£.\J\j—COU 


11 7Q9 

1 1 (at 


1 4ft HQ) 

1 HO 


12.0 


12 683 


80 (35) 


6.6 




R771 
Of f 1 


1 oc (04; 


IOC 
1 0.0 


10 OAR 


1 Oft MA) 

1 UO \HO/ 


10 "\ 


HDL-c, mg/dL 
















5601 


97 (25) 


15.8 


1506 


23 (10) 


14.7 


35-59 


21 151 


260(68) 


12.0 


20 788 


146(64) 


7.5 


>60 


3409 


26(7) 


8.2 


15 761 


58(26) 


3.9 


LDL-C, mg/dL 














<130 


11 142 


104 (27) 


7.3 


15 835 


50 (22) 


2.3 


130-159 


10 384 


124 (32) 


11.3 


10 455 


64 (28) 


6.5 


>160 


8628 


155 (41) 


17.3 


11 767 


113(50) 


10.6 



The age-adjusted 10-year CHD rates were calculated from the Cox proportional hazards model, based on 12 years of follow-up. 



results plotted in Figs 1 and 2. The c statistics associated with TC 
categories were 0.74 in men and 0.77 in women for continuous 
variables by proportional hazards or accelerated failure models, 11 
0.73 in men and 0.76 in women for categorical variables, and 
0.69 in men and 0.72 in women for the risk factor sum. The 



corresponding c statistics associated with LDL-C categories 
were 0.74 in men and 0.77 in women for continuous variables by 
proportional hazards or accelerated failure models, 11 0.73 in men 
and 0.77 in women for categorical variables, and 0.68 in men 
and 0.71 in women for the risk factor sum. 



TABLE 4. Multivariable-Adjusted Relative Risks for CHD According to 
TC Categories 



Men 



Women 





Relative Risk 


95% CI 


Relative Risk 


95% CI 


Age, y 


1.05* 


1.04-1.06 


1.04* 


1.03-1.06 


Blood pressure 










Normal (including optimal) 


1.00 


Referent 


1.00 


Referent 


High normal 


1.31 


0.98-1.76 


1.30 


0.86-1.98 


Hypertension stage 1 


1.67t 


1.28-2.18 


1.73* 


1.19-2.52 


Hypertension stage IHV 


1.84* 


1.37-2.49 


2.12* 


1.42-3.17 


Cigarette use (y/n) 


1.68* 


1.37-2.06 


1.47* 


1.12-1.94 


Diabetes (y/n) 


1.50* 


1.06-2.13 


1.77* 


1.16-2.69 


TC, mg/dL 










<200 


1.00 


Referent 


1.00 


Referent 


200-239 


1 : 31* 


1.01-1.68 


1.51* 


1.01-2.24 


>240 


1.90* 


1.47-2.47 


1.72* 


1.15-2.56 


HDL-C, mg/dL 










<35 


1.47* 


1.16-1.86 


2.02* 


1.29-3.15 


35-59 


1.00 


Referent 


1.00 


Referent 


>60 


0.56* 


0.37-0.83 


0.58+ 


0.43-0.79 



The multivariate models were performed separately for men and women. Each model included 
simultaneously all variables listed in the table. All analyses used categorical variables. 
*.01<P<.05, +.001<P<.01, *P<.001. 



Downloaded from circ.ahajournals.org by on October 16, 2007 



Wilson et al May 12, 1998 1841 



TABLE 5. Multivariate-Adjusted Relative Risks for CHD According to 
LDL-C Categories 



Men 



Women 





Relative Risk 


95% CI 


Relative Risk 


95% CI 


Aae v 


1.05* 


1.04-1.06 


1.04* 


1.03-1.06 


Rlnnrl nrpQQiirp 
DIUUU picoouic 










Normal finrlnHinn nntimah 


1.00 


Rpfprpnt 


1.00 


Referent 


Hinh normal 


1.32 


0 98-1 78 


1.34 


0.88-2.05 


HunprtpriQinn otanp 1 


1.73$ 


1 .32-2.26 


1.75+ 


1.21-2.54 


Nvnortonoinn ctano II 

nypcriciibiun oiayc 11 


1 Q9+ 




2.19t 


1 .46-3.27 




1 71 + 


1 .39-2.1 0 


1.49t 


1.13-1 .97 


Diabetes (y/n) 


1.47* 


1.04-2.08 


1.80t 


1.18-2.74 


LDL-C, mg/dL 










<130 


1.00 


Referent 


1.00 


Referent 


130-159 


1.19 


0.91-1.54 


1.24 


0.84-1.81 


>160 


1.74* 


1.36-2.24 


1.68t 


1.17-2.40 


HDL-C, mg/dL 










<35 


1.461 


1.15-1.85 


2.08t 


1.33-3.25 


35-59 


1.00 


Referent 


1.00 


Referent 


>60 


.0.61* 


0.41-0.91 


0.64f 


Q.47-0.87 



The multivariate models were performed 
simultaneously all variables listed in the table. All 
*.01<P<.05, t.00KP<.01, +F<.001. 



for men and women. Each model included 
used categorical variables. 



Score sheets were developed to predict CHD in men (Fig 
3) and women (Fig 4) from the /^-coefficients of Cox 
proportional hazards models (Table 6). Among women, an 
age-squared term was found to be significant and was 
incorporated into the score sheets. The average CHD risk 
over a period of 10 years tends to plateau slightly in the oldest 
men and women. 

An illustrative example for Fig 3 follows. The subject is a 
55-year-old man with a TC of 250 mg/dL, HDL-C of 39 
mg/dL, and blood pressure of 146/88 who is diabetic and a 
nonsmoker. Proceeding through the steps gives us the follow- 



ing results: Step 1: Age 55=4 points. Step 2: TC 250 
mg/dL =2 points. Step 3: HDL-C 39 mg/dL=l point. Step 4:. 
Blood pressure 146/88 mm Hg=2 points. Step 5: Diabetic=2 
points. Step 6: Nonsmoker=0 points. Step 7: Point total was 
4+2+1+2+2+0=11. Step 8: Estimated 10-year CHD risk 
is 31%. Step 9: The average and "low-risk" risks of CHD 
over a period of 10 years for a 55-year-old man are 16% and 
7%, respectively (low risk was calculated for a person the 
same age, optimal blood pressure, TC 160 to 199 mg/dL, 
HDL-C 45 mg/dL for men or 55 mg/dL for women, non- 
smoker, and no diabetes). Dividing the subject's risk by the 




Figure 1. Receiver operating characteristic curves 
for prediction of CHD in Framingham men over a 
period of 12 years. Separate plots were used for 
continuous, categorical, and risk factor sum mod- 
els, according to whether TC or calculated LDL-C 
was used. 



0.4 0.6 
False Positive 



Downloaded from circ.ahajournals.org by on October 16, 2007 



1842 Prediction of Coronary Heart Disease 



1 T 




o — Categories - Total 
*l— Categories - LDL 
— - Continuous - Total 
X- - Continuous - LDL 
■m — Score - Total 
•— Score - LDL 



Figure 2. Receiver operating charac- 
teristic curves for prediction of CHD in 
Framingham women over a period of 
12 years. Separate plots were used for 
continuous, categorical, and risk factor 
sum models, according to whether TC 
or calculated LDL-C were used: 



0.4 0.6 
False Positive 



average risk provides an estimate of the relative risk: 31% 
divided by 16%= 1.94. Use of the LDL-C approach in the 
score sheets is appropriate when fasting LDL-C estimates are 
available, by use of ultracentrifugation techniques, the 
Friedewald formula, or newer LDL-C assays. 15,25,26 The ap- 
proach is analogous to that shown for TC categories. 

Discussion 

For the past two decades it has been possible to estimate CHD 
risk by use of regression equations derived from observa- 
tional studies, and the present study demonstrates similar 
results, predicting later CHD in a middle-aged white popula- 
tion sample. Prediction models have typically been based on 
the logistic function, although the Weibull distribution has 
also been used. 11,22 Formulations have often included age, sex, 
blood pressure, TC, HDL-C, smoking, diabetes, and left 
ventricular hypertrophy. 11 The prediction of CHD has taken 
the form of sex-specific equations that were developed from 
a single study and applied to other populations or individuals. 
Age, TC, HDL-C, and blood pressure were used in the 
equations as continuous variables, in contrast to dichotomous 
variables (yes/no) such as smoking, diabetes, and left ven- 
tricular hypertrophy. 

The present study builds on the prior experience of CHD 
prediction with continuous variables and integrates the cate- 
gorical approaches that have become part of the framework of 
blood pressure (JNC-V) and cholesterol (NCEP) programs in 
the United States. 67,10 As suggested in an earlier NCEP 
report, 27 our approach integrates blood pressure and choles- 
terol information and estimates both relative and absolute 
CHD risk with a risk factor weighting approach. . 

The NCEP ATP II guidelines defined hypertension as a 
yes/no variable, and it can be seen from Tables 3, 4, and 5 that 
additional blood pressure categories are important in predict- 



ing CHD risk. Higher levels of blood pressure are typically 
associated with abnormal cholesterol levels, greater body 
mass index, and an increased prevalence of diabetes (Table 
2). Data from Tables 3 and 4 demonstrate that blood pressure, 
TC, LDL-C, and HDL-C categories are predictive of CHD 
and suggest that risk factor prevention and intervention 
programs should be integrated, as recently suggested. 28 " 30 
Three reasons probably account for similar results when 
continuous or categorical formulations are used: (1) a large 
enough number of categories has been used to adequately 
describe the clinical data; (2) coronary prediction equations 
have limitations in their precision and accuracy; and (3) in the 
final steps of the prediction score sheet, the data are summa- 
rized, by use of point score totals, providing fewer than 20 
combinations for CHD risk prediction. 

The predictive capability of the continuous model de- 
scribed here is similar to the accelerated failure model used in 
an earlier Framingham CHD prediction equation, 11 and the 
continuous variable and categorical variable approaches have 
c- statistic values that are nearly identical, suggesting that 
predictability of the models is nearly the same in either 
instance. This result is in contradistinction to a comparison of 
the NCEP ATP II algorithm (<10 unique patterns) with a 
continuous variable approach in which the latter (using 
Framingham models) was thought to be statistically superi- 
or. 29 A risk factor sum model, considering 7 dichotomous 
variables, was used for comparison in the present study and 
showed a significant falloff in the level of the c statistic with 
this approach compared with formulations using categorical 
or continuous levels. 

TC- and LDL-C- based approaches, whether continuous or 
categorical variables are used, are similar in their ability to 
predict initial CHD events in the models presented. This may 
result from indirect estimation of LDL-C, leading to reduced 



Downloaded from circ.ahajournals.org by on October 16, 2007 



Wilson et al May 12, 1998 1843 



Years LDL Pta Choi Pis 

. ' -1 Ml; 

35-39 0 {0) 

45-40 .2 12) 

^:?>; : s644\ 3 [3] 

55-59 4 [4] 

65-69 6 [6} 

"-^,70-74^ 7 PJ 



Step2 



--C 



100-129 2.60-3.36 0 
130-150 337-4.14 "/ 0 
160-190 - 4 16-4.92 \ t 



>190 >4.92 " 2 



' ' Chotesteroi 'M. 



160-160 4.16-6.17 
200-239 5/tfM5.21 
24O.270V 6.22-7,24 ■ i 



Step 3 



[0] 

ii) 



(mmoVU LDL Pts Choi Pta 



, 35-44: ; OJM-1.16 1 ^ttift 
-.4549, , 1.17-1^0 . 0 10) 
" 50-50 1.30-1.55 6 fOj 



Step 7 



(sum from steps 1-6) 



c Adding up the polntt^ 



Age 



LOi^orCttolt ' •vj ^^'v^ sh 
HDL-C 

Blood - * v - '"^.V>3 



Point total 















Step 4 




















Systolic 




Diastolic (mm Hg) 




(mm Hg) 


<80 


60-84 


85-60 


90-99 


£100 


<120 












120-120 




0 101 pta 




llifl 




139-139 










140-150 








l2 : fjn:ots x 5 


• 


£160 










3[3)pts 



Note: VVheo &ys$o»c'end diastolic preeswes provide ditkwent 
estimates tor posnt scopes, use the fuo*»er nufftber 



Step 6 



(determine CHO risk from point total) 



, CHD Rtafc 



LDL Pts 10 Yr 
Total CHO Risk 

-2 2% -< 

-<:r 2% 

0 3% 

- 1- - . ^.4% '•: »;;■; 
2 4%; • 

4 7% 

6-'v : ^ 7 V 

6 ,11%^ 

7.-;-. ' , v wv • : ;.'.7 

8 16% 
10 27% 

J11 >;> ; '33%iv v >o 

12 40% 

^uis¥<7% : 

>14 >56% 



Choi Pts 
Total 



10 Yr 
CHO Risk 



ra [3%i 

t2J 14%) 

.M f7%3 

fsi no%3 



18) 



[16%], 



112) (37%) 

113) 7 : •:- I4S%j" 
1*14] ^53%) 



Step 9 



(compare to average person your age) 





Comparative Risk 




Age 


Average 


Average 


Low" 


(years) 


10 Yr CMD 10 Yr Hard* CHO 


10 Yr CHD 




Risk 


Risk 


Rfsk 


mm 




1% >~* 


v'K'2%l>vt 


35-30 


,5% >f _ v 


4% 


3% 


^4044;! 








45-49 


. «*^„ 


8% 


4% 














:;Aio%Kl-: 


tv: 6%13g 


55-59 


16% 


13% 


7% 


>,60-64.n 




I 3 20%> - 




65-69 

70:74; 


25% 

fl30%^I 


22% 


ii% 

•14%^.»Xv'.: 




Step 6 



Smoker vi 



NO 



LDL PtS Choi Pts 
0 JB 







Color 


Relative Risk 


IS 


Very low 


white 


LOW 


.yeflow; 


Moderate 


fresell 


High 




Very high 



* Hard CHO events exdude angina pectoris 

** Low. risk was calculated tor a person (he same 
age. optimal Wood pressure, LDL-C 100-129 mc/oX 
or cholesterol 160-199 mg/dl. HDL-C 45 mgAJL for 
men or 55 mcydL for women, non-smoker, no diabetes 

Risk estimates *ere oenved from !he experience dJ 
the ft yningnam Heart Study, a predornmanOy 
Caucasian population in Massachusetts. USA 



Figure 3. CHD score sheet for men using TC or LDL-C categories. Uses age, TC (or LDL-C), HDL-C, blood pressure, diabetes, and 
smoking. Estimates risk for CHD over a period of 1 0 years based on Framingham experience in men 30 to 74 years old at baseline. 
Average risk estimates are based on typical Framingham subjects, and estimates of idealized risk are based on optimal blood pressure, 
TC 160 to 199 mg/dL (or LDL 100 to 129 mg/dL), HDL-C of 45 mg/dL in men, no diabetes, and no smoking. Use of the LDL-C catego- 
ries is appropriate when fasting LDL-C measurements are available. Pts indicates points. 



accuracy and precision of LDL-C estimates from single blood 
measurements. 35,32 The CHD estimates in the present article 
represent the experience of a free-living population sample, 
and different results may be obtained when blood pressure or 
blood cholesterol has been treated aggressively. 

Although the impact of TC and LDL-C on estimates of CHD 
risk is similar in Framingham data, such results may be more 
relevant to populations than to individuals. Extensive clinical 
data and clinical trial results suggest that LDL-C is the major 
atherogenic lipoprotein and that measurement of LDL-C levels 
in the clinical setting provides an advantage. 33-35 High or low 



levels of HDL-C within individuals can produce discrepancies 
between TC and LDL-C levels. In addition, TC and LDL-C 
levels are not aJways concordant in persons with hypertriglyc- 
eridemia. Thus, measurement of TC is only a crude surrogate for 
LDL-C in risk assessment or in estimating initial response to 
therapy, although it can be useful in initial detection or long-term 
monitoring of response. 31 

Several candidate variables were not used in the predic- 
tion equations. A family history of premature CHD, 
previously shown in the Framingham Study to increase the 
relative odds of CHD to te 1.3, 36 was not uniformly 



Downloaded from circ.ahajournals.org by on October 16, 2007 



1844 Prediction of Coronary Heart Disease 



Stftfi 1 






«• 'Aoo • 






Years 


IJJLPta 


ChoiPts 












55-39 


• «4 


HI 




: : «044:::':-: : 




CO} 




45-19 


3" 














55-59 


7 


m 




^60-64^ 








6549 


B 


W 








1*1; 



Step 7 



(sum tiom steps 1-6) 



{ determine CHD risk from point total) 




Adding up the points 



Age 

LW^CorCtx)! 
HDL ♦ C 



Diabetes 

PO tnt tOtftJ , wmmmmm. 



H DL - C 

LOLPts ChoiPts 



o.9vnft . . a. . - m 

45-49 1.17-1.29 1 [11 
" 50«9 lio-iis 0 |Q1 




<i2o fgm&amm 

120-129 
130-139 
140-159 
>160 



Not*: When systolic and d*str*c pressures pwwJed«*m 
tatmstes to point scores. ust Be Ngtw number 



Steps 










LOLPts 


ChoiPts 


NO 


0 






Step 6 


:•• Smoker ; - ------ .^m 




LOLPts 


Choi Pt» 


NO 


0 

• : — r 2 *v 


CO] 





Color 


Rotative Risk 


m 

white 


Vwytow 


yaOow-. 
&iO>6f£ 


Higfh 


■1.1*1 


Very high 



CHD Risk , ••. •'--• i v- 


LOLPts 


10 Yr 


ChoiPts 


10 Y* 


Total 


CKO Risk 


Tots! 


CKO Risk 








[1%3 




: mi 


0 


7% 




p%] 








v[2%3^ 


2 




TO' 


13%3 








4 ' 


4% 




|4%3 


::^5l^ 




; ts] : •< 


:^t:.{4%3t^ 


6 


6% 


C«l 


15%] 




. 7%V;; 


■ Fi 


9 


9% 


181 


I7%1 


•:-9^;v 




■' 




"10 


ii% 


[103 _ 


no*i 










"12 . "" 


' '15% " 


t«l 


I13%1 


-13..? 






14 


^ 20% 


[153 


t18%) 


15 


^'27%^ &: 




16 


, M*J 


C24%] 











Step 9 



(compare to average person your age) 





otM.??' Ccmwtthre Risk • 




Ago 


Average 


Average 


Low" 


(years) 


IQYrCHO lOYrHanfCMO lOYrCHD 


Risk 


Risk 


Risk 










25-39 






1% 


4<MU; 






: a%v"^ 


45-49 


5% 


2% 


3% 






6% ^ 


55-59 


12% 


7% 


7% 


60r64> 




if 8%?-y,: ^ 


65-69 


13% 




8% 











• HarcC>DrvirattcluOeanowpectr^ 

" Low risk was calculated lo* a person the same 
age. optimal blood pressure. LOL-C 1 00- 129 mp/dL 
or chotestefol 160-199 mpM HCH.-C.45 mgfcLio* 
men or 55 mgAJL lot women, non-smoker, no diabetes 

Risk estates i^<j*t^t^t^eeipcr*nc«ol 
fie f ryninoham Hesrt Studf. a pr«oonwiarslif 
Caucasian rxputation in Massachusetts. USA 



Figure 4. CHD score sheet for women using TC or LDL-C categories. Uses age, TC, HDL-C, blood pressure, diabetes, and smoking. 
Estimates risk for CHD over a period of 10 years based on Framingham experience in women 30 to 74 years old at baseline. Average 
risk estimates are based on typical Framingham subjects, and estimates of idealized risk are based on optimal blood pressure, TC 160 
to 199 mg/dL (or LDL 100 to 129 mg/dL), HDL-C of 55 mg/dL in women, no diabetes, and no smoking. Use of the LDL-C categories is 
appropriate when fasting LDL-C measurements are available. Pts indicates points. 



available among the second-generation participants. Fi- 
brinogen is now recognized as a CHD risk factor, 37 and 
levels were available for ^1000 original cohort partici- 
pants at a 1968-70 examination, 3839 but fibrinogen mea- 
surements were not available for the Offspring Study 
participants. In addition, established methods for measur- 
ing fibrinogen are lacking, and the precise mechanism 
linking elevated fibrinogen levels to CHD is unclear. Other 
risk factors, such as smoking, diabetes, and hypertension, 
are often associated with abnormal fibrinogen levels, and 
Fibrinogen measurements vary greatly within individu- 
als. 3740 Left ventricular hypertrophy on the ECG was used 
in previous CHD prediction algorithms, but it is highly 
associated with hypertension and was not included in the 



present formulation for a variety of reasons, including lack 
of standard universally accepted ECG criteria. n 

Postmenopausal ERT was not used in the prediction 
algorithm, because estrogen dose was typically higher in the 
early 1970s 41 and the cardioprotective effects of hormonal 
replacement therapy that have been universally observed in 
more recent times 42 " 45 were not experienced by all Framing- 
ham women from the early 1970s to the mid 1980s. 46-48 

Persons who exercise typically have a lower risk of 
CHD. 49 ~ 51 Information on physical activity was not available 
at the baseline examinations used to develop this CHD risk 
prediction algorithm, but cigarette smoking, low HDL-C 
levels, and diabetes are less common among those who are 
physically active. 52 " 55 Regular and vigorous exercise is often 



Downloaded from circ.ahajournals.org by on October 16, 2007 



Wilson et al May 12, 1998 1845 



associated with higher levels of HDL-C, an important deter- 
minant for reduced CHD risk. 56 " 58 Similarly, body mass 
index, an obesity index that expresses weight in kilograms 
divided by height in meters squared, has been considered a 
candidate variable for the CHD prediction algorithm. Greater 
obesity has been associated with higher TC, lower HDL-C, 
higher blood pressure, and diabetes, and the residual impact 
of obesity on CHD has typically been slight after incorpora- 
tion of these other variables into the regression model. 8 

Clinicians should exercise caution in generalizing from 
experience of the Framingham Study, a community sample of 
white subjects drawn from a suburb west of Boston. Use of 
the prediction models would be most appropriate for individ- 
uals who resemble the study sample. However, reasonable 
accuracy in predicting CHD has been demonstrated in the 
past, when earlier Framingham CHD prediction equations 
were applied to population samples from Honolulu, Puerto 
Rico, Albany, Chicago, Los Angeles, Minneapolis, Tecum- 
seh, the Western Collaborative Group, and a national co- 
hort. 59 " 62 Follow-up from the Framingham Study was also 
used to estimate CHD experience in men participating in the 
Multiple Risk Factor Intervention Trial. 63 

Coronary prediction estimates tend to be most reliable 
when the data are most concentrated and can be particularly 
useful when subjects have multiple mild abnormalities that 
act synergistically to increase CHD risk. It is uncommon for 
persons to have four or five risk factors, and estimates of 
CHD risk tend to be more precise for individuals with fewer 
risk factors. Score sheet approaches have been used to target 
persons for the primary prevention of coronary disease by use 
of a tabular format called a Sheffield table, in which the 
estimated absolute risk for CHD is used to establish a 
threshold for aggressive intervention. 64 The average CHD 
rates reported in those tables are roughly comparable to the 
myocardial infarction and coronary death rates among mid- 
dle-aged men who participated in the West of Scotland trial 
of cholesterol lowering. 35,65 In contrast, our prediction equa- 
tions estimate coronary disease risk over a period of 10 years 
for a larger age range and include total CHD (angina pectoris, 
myocardial infarction, and coronary death). 

A study that considered CHD prediction using TC, LDL-C, 
TC/HDL-C ratio, and LDL-C/HDL-C ratio 66 concluded that 
"total cholesterol/HDL is a superior measure of risk for CHD 
compared with either total cholesterol or LDL cholesterol, 
and that current practice guidelines could be more efficient if 
risk stratification was based on this ratio rather than primarily 
on the LDL cholesterol level." Such an approach appears 
attractive, but at the extremes of the TC or LDL-C distribu- 
tion, equal ratios may not signify the same CHD risk. 
Moreover, use of a ratio may make it harder for the physician 
to focus on the separate values for TC, LDL-C, and HDL-C 
that have to be borne in mind to make appropriate clinical 
decisions concerning therapy. The current approach builds on 
established blood pressure (JNC-V) and cholesterol (NCEP 
ATP II) foundations, requires fasting samples only if LDL-C 
score sheets are used, and is easy to implement as part of a 
screening program. 

Estimation of CHD and other cardiovascular events is a 
dynamic field. The present formulation has attempted to provide 



TABLE 6. ^-Coefficients Underlying CHD Prediction Sheets 
Using TC Categories 



Variable 


Men 


Women 


Age, y 


0.04826 


0.33766 


Age squared, y 




-0.00268 


TC, mg/dL 






<160 


-0.65945 


-0.26138 


160-199 


Referent 


Referent 


200-239 


0.17692 


0.20771 


240-279 


0.50539 


0.24385 


>280 


0.65713 


0.53513 


HDL-C, mg/dL 






<35 


0.49744 


0.84312 


35-44 


0.24310 


0.37796 


45-49 


Referent 


0.19785 


50-59 


-0.05107 


Referent 


>60 


-0.48660 


-0.42951 








Optimal 


-0.00226 


-0.53363 


Normal 


. Referent 


Referent 


High normal 


0.28320 


-0.06773 


Stage I hypertension 


0.52168 


0.26288 


Stage IMV hypertension 


0.61859 


0.46573 


Diabetes 


0.42839 


0.59626 


Smoker 


0.52337 


0.29246 


Baseline survival function at 10 years, S(t) 


0.90015 


0.96246 



a simplified approach to predict risk for initial CHD events in 
outpatients free of disease, drawing on national programs for 
treatment of elevated blood pressure and TC, without a loss in 
accuracy. Other factors, such as fibrinogen, lipoprotein(a), ERT, 
family history of premature CHD, and hypertensive therapy 
have been or will be evaluated as baseline data and greater 
follow-up experience become available. 

Appendix 
Application of Tables 6 and 7 

The ^-coefficients given in Table 6 are used to compute a linear 
function. The latter is corrected for the averages of the participants* 
risk factors, and the subsequent result is exponentiated and used to 
calculate a 10-year probability of CHD after insertion into a survival 
function. The following explanation and an example treat each of 
these steps in a serial fashion, using Table 6 for the illustration 
below. 

(Equation 1): L^Chol^ = 0.04826 X age- 0.65945 (if cholesterol 
<160) +0.0 (if cholesterol 160 to 199) +0.17692 (if cholesterol 200 
to 239) +0.50539 (if cholesterol 240 to 279) +0.65713 (if choles- 
terol >280) +0.49744 (if HDL-C<35) +0.24310 (if HDL-C 35 to 
44) +0.0 (if HDL-C 45 to 49) -0.05107 (if HDL-C 50 to 59) 
-0.48660 (if HDL-C >60) -0.00226 (if blood pressure [BP] 
optimal) +0.0 (if BP normal) +0.28320 (if BP high normal) 
+0.52168 (if BP stage I hypertension) +0.61859 (if BP stage II 
hypertension) +0.42839 (if diabetes present) +0.0 (if diabetes not 
present) +0.52337 (if smoker) +0.0 (if not smoker). 

The function is evaluated at the values of the means for 
each variable. Call it G, where (Equation 1): G_ChoI mcn 
= 0. 04826 X 4 8. 5 926- 0.65945X0.0743 3 + 0.1 7692 X 
0.3885 1 + 0.50539X0.1 6673 +0.657 13XO.O5826 + 



Downloaded from circ.ahajournals.org by on October 16, 2007 



1846 Prediction of Coronary Heart Disease 



0. 497 44 X 0. 1 9 285 + 0.243 10 X0. 3 5476 -0.051 07 X 
0. 1 9646 -0. 48660 X.0. 10727 -0.0022 6X0.2004 8 + 
. 0.283 20 X 0.20048 + 0.52 168X0.22820 + 0.61 859 X 
0.13057+0.42839X0.05223+0.52337X0.40458 = 3.0975. Simi- 
larly, for women, G_Chol= 9.92545. For the LDL score sheets, 
G_LDL for men is 3.00069 and for women 9.914136. 

This value of G is subtracted from function L to produce function 
A (Equation 2), which is then exponentiated, to produce B (Equation 
3). The latter represents the relative odds for CHD. The survival 
value s(t) is exponentiated by B and subtracted from 1 .0 to calculate 
the 10-year probability of CHD (Equation 4). 

(Equation 2): A=L-G (where G_Chol=3.0975 for men, 
9.92545 for women; similarly for Table 7, G_LDL= 3.00069 for 
men, 9.914136 for women). 

(Equation 3): B=e A . 

(Equation 4): f > =l-[s(t)] B [where s(tLChol 10 years=0.90015 for 
men, 0.96246 for women; similarly for Table 7, s(t)_LDL 10 
years=0.90017 for men, 0.9628 for women]. . 

Consider a 55-year-old man with cholesterol of 250 mg/dL, HDL-C 
of 39 mg/dL, blood pressure (146/88 mm Hg) that falls into stage I 
hypertension, and no diabetes, who is a smoker. In this instance, after 
Equation 1, L= 55 X0.04826 +0.50539 +0.243 10+ 0.52 168 + 0.52337 
=4.4478. After Equation 2, A=4.4478-3.0975= 1.3503, and after 
Equation 3, B=e U5Q3 = 3.85874. Finally, after Equation 4, 
P= 1 -0.9001 5 3Jts874 = 1-0.66637 =0.3336, for a 33% chance of devel- 
oping CHD over 10 years. According to the point score sheet, 55 years 
old (4 points)+cholesterol of 250 mg/dL (2 points)+HDL-C of 39 
mg/dL (1 point)+ stage I blood pressure (2 points) + smoker (2 
points) = 1 1 points, corresponding to a 31% chance of developing CHD 
over 10 years. An average 55-year-old man has a 16% risk, and an ideal 
man has a 7% risk. Similar calculations can be done for women and for 
the LDL-C prediction models and score sheets. 



TABLE 7. /3-Coefficients Underlying CHD Prediction Sheets 
Using LDL-C Categories 



Variable 


Men 


Women 


Age, y 


0.04808 


0.33994 


Age squared, y 




-0.0027 


LDL-C, mg/dL 






<100 


-0.69281 


-0.42616 


100-129 


Referent 


Referent 


130-159 


0.00389 


0.01366 


160-189 


0.26755 


0.26948 


>190 


0.56705 


0.33251 


HDL-C, mg/dL 






<35 


0.48598 


0.88121 


35-44 


0.21643 


0.36312 


4&-49 


Referent 


0.19247 


50-59 


-0.04710 


Referent 


>60 


-0.34190 


-0.35404 


Blood pressure 






Optimal 


-0.02642 


-0.51204 


Normal 


Referent 


Referent 


High normal 


.0.30104 


-0.03484 


Stage I hypertension 


0.55714 


0.28533 


Stage IHV hypertension 


0.65107 


0.50403 


Diabetes 


0.42146 


0.61313 


Smoker 


0.54377 


0.29737 


Baseline survival function at 10 years, S(t) 


0.90017 


0.9628 



Acknowledgments 

This work is from the National Heart Lung, and Blood Institute's 
Framingham Heart Study, supported by NIH/NHLBI contract N01- 
HC-38038. The authors would like to acknowledge the careful 
review and helpful criticism by Dr James Cleeman, Coordinator of 
the National Cholesterol Education Program at the National Heart, 
Lung, and Blood Institute. 

References 

1. McGovern PG, Pankow JS, Shahar E, Doliszny KM, Folsom AR, 
Blackburn H, Luepker RV, the Minnesota Heart Survey Investigators. 
Recent trends in acute coronary heart disease: mortality, morbidity, 
medical care, and risk factors. N Engl J Med. 1996;334:884- 890. 

2. Gordon T, Kannel WB. Multiple risk functions for predicting coronary 
heart disease: the concept, accuracy, and application. Am Heart J. 1982; 
103:1031-1039. 

3. Kannel WB, McGee DL. Diabetes and glucose tolerance as risk factors for 
cardiovascular disease: the Framingham Study. Diabetes Care. 1979^:120-126. 

4. Gordon T, Castelli WP, Hjortland MC, Kannel WB, Dawber TR. 
Diabetes, blood lipids, and the role of obesity in coronary heart disease 
risk for women. Ann Intern Med. 1977;87:393-397. 

5. The Expert Panel. Report of the National Cholesterol Education Program 
Expert Panel on detection, evaluation, and treatment of high blood cho- 
lesterol in adults. Arch Intern Med. 1988;34:193-201. 

6. Expert Panel on Detection, Evaluation, and Treatment of High Blood 
Cholesterol in Adults. Summary of the second report of the National 
Cholesterol Education Program (NCEP) expert panel on detection, eval- 
uation, and treatment of high blood cholesterol in adults (Adult Treatment 
Panel II). JAMA. 1993;269:3015-3023. 

7. The Expert Panel. National Cholesterol Education Program Second Report The 
expert panel on detection, evaluation, and treatment of high blood cholesterol in 
adults (Adult Treatment Panel U). Circulation. 1994;89:1333-1445. 

8. Anderson KM, Odell PM, Wilson PWF, Kannel WB. Cardiovascular 
disease risk profiles. Am Heart J. 1991;121:293-298. 

9. The Expert Panel. Expert panel on detection, evaluation and treatment of 
high blood cholesterol in adults: summary of the second report of the 
NCEP expert panel (Adult Treatment Panel II). JAMA. 1993;269: 
3015-3023. 

10. Joint National Committee. The fifth report of the Joint National Com- 
mittee on detection, evaluation, and treatment of high blood pressure 
(JNC V). Arch Intern Med. 1993;153:154-183. 

11. Anderson KM, Wilson PWF, Odell PM, Kannel WB. An updated coro- 
nary risk profile: a statement for health professionals. Circulation. 1991; 
83:357-363. 

12. Kannel WB, Feinleib M, McNamara PM, Garrison RJ, Castelli WP. An 
investigation of coronary heart disease in families: the Framingham 
Offspring Study. Am J Epidemiol. 1979;110:281-290. 

13. Abell LL, Levy BB, Brodie BB, Kendall FE. A simplified method for the 
estimation of total cholesterol in serum and demonstration of its speci- 
ficity. J Biol Chem. 1952;195:357-366. 

14. Lipid Research Clinics Program. Manual of Laboratory Operation. 
Bethesda, Md: National Institutes of Health; 1974:75-628. 

15. Friedewald WT, Levy Rl, Fredrickson DS. Estimation of the concen- 
tration of low-density lipoprotein cholesterol in plasma, without the use of 
the preparative ultracentrifuge. Clin Chem. 1972;18:499-502. 

16. Manual of Laboratory Operations: Lipid Research Clinics Program, 
Lipid and Lipoprotein Analysis. Washington, DC: National Institutes of 
Health, US Department of Health and Human Services; 1982. 

17. Kannel WB, Wolf PA, Garrison RJ. Monograph Section 34: Some Risk 
Factors Related to the Annual Incidence of Cardiovascular Disease and 
Death Using Pooled Repeated Biennial Measurements: Framingham 
Heart Study, 30-Year Followup. Springfield, Mass: National Technical 
Information Service; 1987:1-459. 

18. Neter J, Wasserman W. Multiple regression. In: Applied Linear Statistical 
Models. Homewood, 111: Irwin; 1974:214-272. 

19. Cox DR. Regression models and life tables. J R Stat Soc B. 1972;34: 
187-220. 

20. Harrell FE Jr, Lee KL, Mark DB. Multivariable prognostic models: issues 
in developing models, evaluating assumptions and adequacy, and mea- 
suring and reducing errors. Stat Med. 1996;15:361-387. 

21. Benfante R, Reed D. Is elevated serum cholesterol level a risk factor for 
coronary heart disease in the elderly? JAMA. 1990;263:393-396. 

22. Wilson PWF, Castelli WP, Kannel WB. Coronary risk prediction in 
adults: the Framingham Heart Study. Am J Cardiol. 1987;59:91-94. 



Downloaded from circ.ahajournals.org by on October 1 6, 2007 



Wilson et al May 12, 1998 1847 



23. Corti MC, Guralnik JM, Salive ME, Harris T, Field TS, Wallace RB, 
Berkman LF, Seeman TE, Glynn RJ, Hennekens CH, Havlik RJ. HDL 
cholesterol predicts coronary heart disease mortality in older persons. 
JAMA. 1995;274:539-544. 

24. Wilson PWF, Kannel WB. Hypercholesterolemia and coronary risk in the 
elderly: the Framingham Study. Am J Geriat Cardiol 1993;2:52-56. 

25. McNamara JR, Cohn JS, Wilson PWF, Schaefer EJ. Calculated values for 
low-density lipoprotein cholesterol in the assessment of lipid abnor- 
malities and coronary disease risk. Clin Chem. 1990;36:36-42. 

26. McNamara JR, Cole TG, Contois JH, Ferguson CA, Ordovas JM, ' 
Schaefer EJ. Immunoseparation method for measuring low-density 
lipoprotein cholesterol directly from serum evaluated. Clin Chem. 1995; 
41:232-240. 

27. National Education Programs Working Group report on the management 
of patients with hypertension and high blood cholesterol. Ann Intern Med., 
1991;114:224-237. 

28. Grover SA, Abrahamowicz M, Joseph L, Brewer C, Coupal L, Suissa S. The 
benefits of treating hypeiiipidemia to prevent coronary heart disease: estimating 
changes in life expectancy and morbidity. JAMA. 1992;267:816-822. 

29. Grover SA, Coupal L, Hu XP. Identifying adults at increased risk of 
coronary disease: how well do the current cholesterol guidelines work? 
JAMA. 1995;274:801-806. . 

30. Levy D. Have expert panel guidelines kept pace with new concepts in 
hypertension? Lancet. 1995;346:1 1 12. 

31. Cooper GR, Myers GL, Smith J, Schlant RC. Blood lipid measurements: 
variations and practical utility. JAMA. 1992;267:1652-1660.. 

32. Wilson PWF. Cholesterol screening: once is not enough. Arch Intern 
Med. 1995;155:2146-2147. 

33. Blankenhorn DH, Nessim SA, Johnson RL, Sanmarco ME, Azen SP, 
Cashin-Hemphill L. Beneficial effects of combined colestipol- niacin 
therapy on coronary atherosclerosis and coronary venous bypass grafts. 
JAMA. 1987;257:3233-3240. 

34. Trie 4S Group. Randomised trial of cholesterol lowering in 4444 patients 
with coronary heart disease: the Scandinavian Simvastatin Survival Study 
(4S). Lancet. 1 994;344: 1 383-1 389. 

35. Shepherd J, Cobbe SM, Ford I, Isles CG, Lorimer AR, MacFarlarie PW, 
McKillop JH, Packard CJ, West of Scotland Coronary Prevention Study 
Group. Prevention of coronary heart disease with pravastatin in men with 
hypercholesterolemia. N Engl J Med. 1995;333:1301-1307. 

36. Myers RH, Kiely DK, Cupples LA, Kannel WB. Parental history is an 
independent risk factor for coronary artery disease: the Framingham 
Study. Am Heart J. 1990;120:963-969. 

37. Emst E, Resch KL. Fibrinogen as a cardiovascular risk factor: a meta- 
analysis and review of the literature. Ann Intern Med. 1993;1 18:956-963. 

38. Kannel WB, Wolf R, Castelli WP, D* Agostino RB. Fibrinogen and risk of 
cardiovascular disease: the Framingham Study. JAMA. 1987;258: 1 183-1 186. 

39. Kannel WB, D* Agostino RB, Wilson PWF, Belanger AJ, Gagnon DR. 
Diabetes, fibrinogen, and risk of cardiovascular disease: the Framingham 
experience. Am Heart J. 1990;120:672-676. 

40. Barasch E, Benderly M, Graff E, Behar S, Reicher-Reiss H, Caspi A, 
Pelled B, Reisin L, Roguin N, Goldbourt U. Plasma fibrinogen levels and 
their correlates in 6457 coronary heart disease patients: the Bezafibrate 
Infarction Prevention (BIP) Study. J Clin Epidemiol. 1995;48:757-765. 

41. Pasley BH, Standfast SJ, Katz SH. Prescribing estrogen during meno- 
pause: physician survey of practices in 1974 and 1981. Public Health 
Rep. 1984;99:424-429. 

42. Bush TL, Cowan LD, Barrett-Connor EL, Criqui MH, Karon JM, Wallace 
RB, Tyroler HA, Rifkind BM. Estrogen use and all-cause mortality. 
JAMA. 1983;249:903-906. 

43. Barrett-Connor EL, Bush TL. Estrogen and coronary heart disease in 
women. JAMA. 1991;265:1861-1867. 

44. Stampfer MJ, Coldilz GA, Willett WC, Manson JE, Rosner B, Speizer 
FE, Hennekens CH. Postmenopausal estrogen therapy and cardiovascular 
disease: ten-year follow-up from the Nurses' Health Study. N Engl J Med. 
1991;325:756-762. 



45. Stampfer MJ, Colditz GA. Estrogen replacement therapy and coronary 
heart disease: quantitative assessment of the epidemiologic evidence. 
PrevMed. 1991;20:47-63. 

46. Wilson PWF, Garrison RJ, Castelli WP ? Postmenopausal estrogen use, 
cigarette smoking, and cardiovascular morbidity: the Framingham Study. 
N Engl J Med. 1985;313:1038-1043. 

47. Eaker ED, Castelli WP. Coronary heart disease and its risk factors among 
women in the Framingham Study. In: Eaker ED, Packard B, Wenger NK, 
Clarkson TB, Tyroler HA, eds. Coronary Heart Disease in Women. New 
York, NY: Haymarket Doyma Inc; 1987:122-130. 

48. Petitti DB. Reporting results. In: M eta-Analysis, Decision Analysis, and 
Cost-Effectiveness Analysis. New York, NY: Oxford; 1994:197-211. 

49. Powell KE, Thompson PD, Caspersen CJ, Kendrick JS. Physical activity 
and the incidence of coronary heart disease. Anna Rev Public Health. 
1987;8:253-287. 

50. Lee 1M, Hsieh CC, Paffenbarger RS Jr. Exercise intensity and longevity 
in men: the Harvard Alumni Health Study. JAMA. 1995;273:1 179-1 184. 

51. Berlin JA, Colditz GA. A meta-analysis of physical activity in the pre- 
vention of coronary heart disease. Am J Epidemiol. 1990;132:612-628. 

52. Wilson PWF. High-density lipoprotein, low-density lipoprotein and cor- 
onary artery disease. Am J Cardiol 1990;66(suppl A):7-10. 

53. Anderson KM, Wilson PWF, Garrison RJ, Castelli WP. Longitudinal and secular 
trends in lipoprotein cholesterol measurements in a general population sample: 
the Framingham Offspring Study. Atherosclerosis. 1987;68:59-66. 

54. Heimlich SP, Ragland DR, Leung RW, Paffenbarger RS Jr. Physical 
activity and reduced occurrence of non-insulin-dependent diabetes 
mellitus. N Engl J Med. 1991;325:147-152. 

55. Burchfiel CM, Curb JD, Sharp DS, Rodriguez BL, Arakaki R, Chyou PH, 
Yano K. Distribution and correlates of insulin in elderly men: the Honolulu 
Heart Program. Arterioscler Thromb Vase Biol 1 995; 1 5:221 3-222 1 . 

56. Wood PD. Physical activity, diet, and health: independent and interactive 
effects. Med Sci Sports Exerc. 1994;26:838-843. 

57. Dannenberg AL, Keller JB, Wilson PWF, Castelli WP. Leisure time physical 
activity in the Framingham Offspring Study: description, seasonal variation, 
and risk factor correlates. Am J Epidemiol. 1989;129:76-87. 

58. Wood PD, Haskell WL, Klein H, Lewis S; Stem MP, Farquhar JW. The . 
distribution of plasma lipoproteins in middle-aged male runners. Metab- 
olism. 1976;25:1249-1257. 

59. Gordon T, Garcia-Palmieri MR, Kagan A, Kannel WB, Schiffman J. 
Differences in coronary heart disease in Framingham, Honolulu and 
Puerto Rico. J Chronic Dis. 1974;27:329-344. 

60. McGee D, T Gordon. The Framingham Study applied to four other U. S. 
based epidemiological studies of cardiovascular disease (Section No. 31). 
Bethesda, Md: US Department of Health, Education, and Welfare, NIH; 
1976:76-1083. 

61. Brand RJ, Rosenman RH, Scholtz RI. Multivariate prediction of coronary 
heart disease in the Western Collaborative Group Study compared to the 
findings of the Framingham Study. Circulation. 1976;53:348-355. 

62. Leaverton PE, Sorlie PD, Kleinman JC, Dannenberg AL, Ingster-Moore 
L, Kannel WB, Comoni-Huntley JC. Representativeness of the Fram- 
ingham risk model for coronary heart disease mortality: a comparison 
with a national cohort study. J Chronic Dis. 1987;40:775-784. 

63. The Multiple Risk Factor Intervention Trial Group. Statistical design 
considerations in the NHLI multiple risk factor intervention trial 
(MRFIT). J Chronic Dis. 1977;30:261-275. 

64. Ramsay LE, Haq IU, Jackson PR, Yeo WW, Pickin DM, Payne JN. 
Targeting lipid-lowering drug therapy for primary prevention of coronary 
disease: an updated Sheffield table. Lancet. 1996;348:387-388. 

65. West of Scotland Coronary Prevention Group. West of Scotland Coronary 
Prevention Study: identification of high-risk groups and comparison with 
other cardiovascular intervention trials. Lancet. 1996;348:1339-1342. 

66. Kinosian B, Glick H, Garland G. Cholesterol and coronary heart disease: 
predicting risks by levels and ratios. Ann Intern Med. 1994;121:641-647. 



Downloaded from circ.ahajournals.org by on October 16, 2007 



ARTICLES 



Genome-wide association study identifies 
novel breast cancer susceptibility loci 

Douglas F. Easton 1 , Karen A. Pooley 2 , Alison M. Dunning 2 , Paul D. P. Pharoah 2 , Deborah Thompson 1 , 

Dennis G. Ballinger 3 , Jeffery P. Struewing 4 , Jonathan Morrison 2 , Helen Field 2 , Robert Luben^, Nicholas Wareham 5 , 

Shahana Ahmed 2 , Catherine S. Healey 2 , Richard Bowman 6 , the SEARCH collaborators 2 *, Kerstin B. Meyer 7 , 

Christopher A. Haiman 8 , Laurence K. Kolonel 9 , Brian E. Henderson 8 , Loic Le Marchand 9 , Paul Brennan -°, 

Suleeporn Sangrajrang 11 , Valerie Gaborieau 10 , Fabrice Odefrey 10 , Chen-Yang Shen 12 , Pei-Ei Wu 12 , 

Hui-Chun Wang 12 , Diana Eccles 13 , D. Gareth Evans 14 , Julian Peto 15 , Olivia Fletcher 16 , Nichola Johnson 16 , 

Sheila Seal 17 , Michael R. Stratton 17 ' 18 , Nazneen Rahman 17 , Georgia Chenevix-Trench 19 , Stig E. Bojesen 20 , 

Borge G. Nordestgaard 20 , Christen K. Axelsson 21 , Montserrat Garcia-Closas 22 , Louise Brinton 22 , Stephen Chanock 23 , 

Jolanta Lissowska 24 , Beata Peplonska 25 , Heli Nevanlinna 26 , Rainer Fagerholm 26 , Hannaleena Eerola 26,27 , 

Daehee Kang 28 , Keun-Young Yoo 28 ' 29 , Dong- Young Noh 28 , Sei-Hyun Ahn 30 , David J. Hunter* 1 ' 32 , 

Susan E. Hankinson 32 , David G. Cox 31 , Per Hall 33 , Sara Wedren 33 , Jianjun Liu 34 , Yen-Ling Low 34 , 

Natalia Bogdanova 35 ' 36 , Peter Schurmann 36 , Thilo Dork 36 , Rob A. E. M. Tollenaar 37 , Catharina E. Jacobi 38 , 

Peter Devilee 39 , Jan G. M. Klijn 40 , Alice J. Sigurdson 41 , Michele M. Doody 41 , Bruce H. Alexander 42 , Jinghui Zhang 4 , 

Angela Cox 43 , Ian W. Brock 43 , Gordon MacPherson 43 , Malcolm W. R. Reed 44 , Fergus J. Couch 45 , Ellen L. Goode 45 , 

Janet E. Olson 45 , Hanne Meijers-Heijboer 46 ' 47 , Ans van den Ouweland 47 , Andre Uitterlinden 48 , 

Fernando Rivadeneira 48 , Roger!. Milne 49 , Gloria Ribas 49 , Anna Gonzalez-Neira 49 , Javier Benitez 49 , John L. Hopper 50 , 

Margaret McCredie 51 , Melissa Southey 50 , Graham G. Giles 52 , Chris Schroen 53 , Christina Justenhoven 54 , 

Hiltrud Brauch 54 , Ute Hamann 55 , Yon-Dschun Ko 56 , Amanda B. Spurdle 19 , Jonathan Beesley 19 , Xiaoqing Chen 19 , 

kConFab 57 *, AOCS Management Group 19 * 57 *, Arto Mannermaa 58 ' 59 , Veli-Matti Kosma 58 ' 59 , Vesa Kataja 58 ' 60 , 

Jaana Hartikainen 58 ' 59 , Nicholas E. Day 5 , David R. Cox 3 & Bruce A. J. Ponder 2,7 

Breast cancer exhibits familial aggregation, consistent with variation in genetic susceptibility to the disease. Known 
susceptibility genes account for less than 25% of the familial risk of breast cancer, and the residual genetic variance is likely 
to be due to variants conferring more moderate risks. To identify further susceptibility alleles, we conducted a two-stage 
genome-wide association study in 4,398 breast cancer cases and 4,316 controls, followed by a third stage in which 30 single 
nucleotide polymorphisms (SNPs) were tested for confirmation in 21,860 cases and 22,578 controls from 22 studies. We 
used 227,876 SNPs that were estimated to correlate with 77% of known common SNPs in Europeans at r 2 > 0.5. SNPs in five 
novel independent loci exhibited strong and consistent evidence of association with breast cancer (P < 10 7 ). Four of these 
contain plausible causative genes (FGFR2, TNRC9, MAP3K1 and LSP1). At the second stage, 1,792 SNPs were significant at the 
P < 0.05 level compared with an estimated 1,343 that would be expected by chance, indicating that many additional common 
susceptibility alleles may be identifiable by this approach. 



Breast cancer is about twice as common in the first-degree relatives of 
women with the disease as in the general population, consistent with 
variation in genetic susceptibility to the disease 1 . In the 1990s, two 
major susceptibility genes for breast cancer, BRCA1 and BRCA2, were 
identified 2 ' 3 . Inherited mutations in these genes lead to a high risk of 
breast and other cancers 4 . However, the majority of multiple case 
breast cancer families do not segregate mutations in these genes. 
Subsequent genetic linkage studies have, failed to identify further 
major breast cancer genes 5 . These observations have led to the pro- 
posal that breast cancer susceptibility is largely 'polygenic': that is, 
susceptibility is conferred by a large number of loci, each with a small 
effect on breast cancer risk 6 . This model is consistent with the ob- 
served patterns of familial aggregation of breast cancer 7 . However, 

Affiliations of the above authors are given at the end of the paper. 

"Lists of consortia participants and affiliations appear after author affiliations. 



progress in identifying. the relevant loci has been slow. As linkage 
studies lack power to detect alleles with moderate effects on risk, large 
case- control association studies are required. Such studies have iden- 
tified variants in the DNA repair genes CHEK2, ATM, BR1P1 and 
PALB2 that confer an approximately twofold risk of breast cancer, 
but these variants are rare in the population 8 " 14 . A recent study has 
shown that a common coding variant in CASP8 is associated with a 
moderate reduction in breast cancer risk 15 . After accounting for all 
the known breast cancer loci, more than 75% of the familial risk of 
the disease remains unexplained 16 . 

Recent technological advances have provided platforms that allow 
hundreds of thousands of SNPs to be analysed in association studies, 
thus providing a basis for identifying moderate risk alleles without 



1087 



©2007 Nature Publishing Group 



prior knowledge of position or function. It has been estimated that 
there are 7 million common SNPs in the human genome (with minor 
allele frequency, m.a.f., >5%) 17 . However, because recombination 
tends to occur at distinct 'hot-spots', neighbouring polymorphisms 
are often strongly correlated (in 'linkage disequilibrium', LD) with 
each other. The majority of common genetic variants can therefore be 
evaluated for association using a few hundred thousand SNPs as tags 
for all the other variants 18 . We aimed to identify further breast cancer 
susceptibility loci in a three-stage association study 19 . In the first 
stage, we used a panel of 266,722 SNPs, selected to tag known com- 
mon variants across the entire genome 18 . These SNPs were genotyped 
in 408 breast cancer cases and 400 controls from the UK; data were 
analysed for 390 cases and 364 controls genotyped for ^80% of 
the SNPs. The cases were selected to have a strong family history of 
breast cancer, equivalent to at least two affected female first-degree 
relatives, because such cases are more likely to carry susceptibility 
alleles 20 . Initally, we analysed 227,876 SNPs (85%) with genotypes on 
at least 80% of the subjects. We estimate that these SNPs are corre- 
lated with 58% of common SNPs in the HapMap CEPH/CEU (Utah 
residents with ancestry from northern and western Europe) samples 
at ? > 0.8, and 77% at r 2 > 0.5 (mean r 2 = 0.75; see Supplementary 
Fig. 1) (http://www.hapmap.org/) 21 . As expected, coverage was 
strongly related to m.a.f.: 70% of SNPs with m.a.f. > 10% were tagged 
at r 2 > 0.8, compared with 23% of SNPs with m.a.f. 5-10%. The main 
analyses were restricted to 205,586 SNPs that had a call rate of 90% 
and whose genotype distributions did not differ from Hardy- 
Weinberg equilibrium in controls (at P< 10~ 5 ). 

For the second stage we selected 12,711 SNPs, approximately 5% of 
those typed in stage 1 , on the basis of the significance of the difference 
in genotype frequency between cases and controls. These SNPs were 



a 25 



20 

■o 

| 1 5 
■6 

^ 10- 
5-i 



10 



15 



20 



b so H 



40- 



"D 

cD 30 



20 



10- 




10 



15 



Expected % 2 



Figure 1 1 Quantile-quantile plots for the test statistics (Cochran- 
Armitage 1 d.f. / 2 trend tests) for stages 1 and 2. a, Stage 1 ; b, stage 2. Black 
dots are the uncorrected test statistics. Red dots are the statistics corrected by 
ihe genomic control method (A = 1.03 for stage 1, 1 = 1.06 for stage 2). 
Under the null hypothesis of no association at any locus, the points would be 
expected to follow the black line. 



then genotyped in a further 3,990 invasive breast cancer cases and 
3,916 controls from the SEARCH study, using a custom-designed 
oligonucleotide array. In the main analyses, we considered 10,405 
SNPs with call rate of >95% that did not deviate from Hardy- 
Weinberg equilibrium in controls. 

Comparison of the observed and expected distribution of test stat- 
istics showed some evidence for an inflation of the test statistics in both 
stage 1 (inflation factor A = 1.03, 95% confidence interval (CI) 1.02- 
1 .04) and stage 2 (X = 1 .06, 95% CI 1.04-1 .12), based on the 90% least 
significant SNPs (Fig. 1). Possible explanations for this inflation 
include population stratification, cryptic relatedness among subjects, 
and differential genotype calling between cases and controls. There 
was evidence for an excess of low call rate SNPs among the most 
significant SNPs (P< 0.01) in stage 1, but not in stage 2, suggesting 
that some of this effect is a genotyping artefact (Supplementary Table 
1 ). However, the inflation was still present among SNPs with call rate 
>99% in both cases and controls, possibly reflecting population sub- 
structure. We computed 1 degree of freedom (d.f.) association tests for 
each SNP, combining stages 1 and 2. After adjustment for this inflation 
by the genomic control method 22 , we observed more associations than 
would have been expected by chance at P< 0.05 (Table 1). One SNP 
(dbSNP rs298 1 582) was significant at the P < 10~ 7 level that has been 
proposed as appropriate for genome-wide studies 23 . 

In the third stage, to establish whether any SNPs were definitely 
associated with risk, we tested 30 of the most significant SNPs in 22 
additional case-control studies, comprising 21,860 cases of invasive 
breast cancer, 988 cases of carcinoma in situ (CIS) and 22,578 controls 
(Supplementary Table 2). Six SNPs showed associations in stage 3 that 
were significant at P^ 10" 5 with effects in the same direction as in 
stages 1 and 2 (Table 2, Supplementary Table 3, and Fig. 2). All these 
SNPs reached a combined significance level of P < 10 7 (ranging from 
2 X 10" 76 to 3 X 10~ 9 ). Of these six SNPs, five were within genes or 
LD blocks containing genes. SNP rs2981582 lies in intron 2 of FGFR2 
(also known as CEK3), which encodes the fibroblast growth factor 
receptor 2. SNPs rsl2443621 and rs8051542 are both located in an 
LD block containing the 5' end of TNRC9 (also known as TOX3), a 
gene of uncertain function containing a tri- nucleotide repeat motif, as 
well as the hypothetical gene, LOC643714. SNP rs889312 lies in an LD 
block of approximately 280 kb that contains MAP3K1 (also known as 
MEKK) y which encodes the signalling protein mitogen-activated pro- 
tein kinase kinase kinase 1, in addition to two other genes: MGC33648 
and M1ER3. SNP rs3817198 lies in intron 10 of LSP1 (also known as 
WP43), encoding lymphocyte- specific protein 1, an F-actin bundling 
cytoskeletal protein expressed in haematopoietic and endothelial cells. 
A further SNP, rs2107425, located just HOkilobases (kb) from 
rs3817198, was also identified (overall P= 0.00002). rs2107425 is 
within the H19 gene, an imprinted maternally expressed untranslated 
messenger RN A closely involved in regulation of the insulin growth 
factor gene, 1GF2. In stage 3, however, rs2 107425 was only weakly 
significant after adjustment for rs3817198 by logistic regression 
(P= 0.06). This suggests that the association with breast cancer risk 
may be driven by variants in LSP1 rather than in H19. The sixth SNP 
reaching a combined P< 10~ 7 was rs!3281615, which lies on 8q. It is 
correlated with SNPs in a HOkb LD block that contains no known 

Table 1 1 Number of significant associations after stage 2 



Level of significance 


Observed 


Observed 
adjusted* 


Expected 


Ratio 


0.01-0.05 


1,239 


1,162 


934.3 


1.24 


0.001-0.01 


574 


517 


347.6 


1.49 


0.0001-0.001 • 


112 


88 


53.3 


1.65 


0.00001-0.0001 


16 


12 


7.0 


1.71 


<0.00001 


15 


13 


0.96 


13.5 


All P< 0.05 


1,956 


1,792 


1,343.2 


1.33 



Observed numbers of SNPs associated with breast cancer after stage 2, by level of significance, 
before and after adjustment for population stratification, and expected numbers under the null 
hypothesis of no association. 

* Adjusted for inflation of the test statistic by the genomic control method. 



1088 



©2007 Nature Publishing Group 



Table 2 | Summary of results for eleven SNPs selected for stage 3 that showed evidence of an association with breast cancer 

~— — — — Per allele OR HetOR HomOR P-trend 

(95% CI) 



rs Number 



Gene 



Position* 



m.a.f.t 



(95% CI) 



(95% CI) 



Stages 
1 and 2 



Stage3 



Combined 



5 x icr 62 

9X10" 14 . 
4 X 10~ 8 
3 X 10" 15 
1(T 5 
0.01 

6 X 10~ 7 



2 X 10"* 76 

2 X 1(T 19 

io- 12 

7 X IO" 20 

3 X 1(T 9 
2 X 10~ 5 
5X10" 12 



rs2981582 FGFR2 

rsl2443621 TNRC9/ 

LOC643714 



rs8051542 
rs889312 
rs3817198 LSP1 
rs2107425 H19 
rsl3281615 



TNRC9/ 
LOC643714 
MAP3K1 



lOq 

123342307 
16q 

51105538. 
16q 

51091668 
5q 

56067641 
Hp 

1865582 
lip 

1977651 
8q 

128424800 



0.38 

(030) 

0.46 

(0.60) 

0.44 

(020) 

0.28 

(0.54) 

0.30 

(0.14) 

0.31 

(0.44) 

0.40 

(0.56) 



1.26 
(1.23-1.30) 

1.11 
(1.08-1.14) 

1.09 
(1.06-1.13) 

1.13 
(1.10-1.16) 

1.07 
(1.04-1.11) 

0.96 
(0.93-0.99) 

1.08 
(1.05-1.11) 



1.23 . 
(1.18-1.28) 

1.14 , 
(1.09-1.20) 

1.10 
(1.05-1.16) 

1.13 
(1.09-1.18) 

1.06 
(1.02-1.11) 

0.94 
(0.90-0.98) 

1.06 
(1.01-1.11) 



1.63 
(1.53-1.72) 

1.23 . 
(1.17-1.30) 

1.19 
(1.12-1.27) 

1.27 
'(1.19-1.36) 

1.17 
(1.08-1.25) 

0.95 
(0.89-1.01) 

1.18 
(1.10-1.25) 



4X10 -16 
IO" 7 
4 X 10~ 6 
4 X 10" 6 
8 X 10" 6 
7 X 10~ 6 
2 x 10" 7 



rs981782 


5p 


0.47 


0.96 


0.96 


0.92 ' 


8 X 10 _s 


0.003 


9X10" 6 


45321475 


(0.37) 


(0.93-0.99) 


(0.92-1.01) 


(0.87-0.97) 






0.001 


rs30099 


5q 


0.08 


1.05 


1.06 


1.09 


0.003 


0.02 


52454339 


(0.39) 


(1.01-1.10) 


(1.00-1.11) 


(0.96-1.24) 




0.04 




rs4666451 


2p- 


0.41 


0.97 


0.98 


0.93 


5 X 10~ 6 


6X10" 5 


19150424 


(0.04) 


(0.94-1.00) 


(0.93-1.02) 


(0.87-0.99) 


_ 1? 




,~-36 



rs3803662J TNRC9/ 

LOC643714 



16q 

51143842 



0.25 
(0.60) 



1.20 
(1.16-1.24) 



1.23 
(1.18-1.29) 



10" 



OR, odds ratio; HetOR, odds ratio in heterozygotes; HomOR, odds ratio in rare homozygotes (relative to common homozygotes); CI, confidence interval. 

* Build 36.2 position. . 

t Minor allele frequency in SEARCH. (UK) study. Combined allele frequency from three Asian studies in italics. 

t \SS^^SZit of the initial tag SNP set but identified as a result of fine-scale mapping of the TNRC9/LOC643714 locus and typed in the stage 2 and stage 3 sets (but not the stage 1 set). 



genes. The basis of this association therefore remains obscure. This 
SNP is approximately 130kb proximal to rs 1447295, 60 kb proximal 
to rs6983267 and 230 kb distal to rsl 690 1979, recently shown to be 
associated with prostate cancer 24-26 . 

In addition to the seven SNPs described above, there was evidence 
of association among the remaining 23 SNPs (global P= 0.001 in 
stage 3). In particular, three SNPs showed some evidence of asso- 
ciation in stage 3 (P< 0.05, in each case in the same direction as in 
stages 1 and 2; Table 2). SNPs rs981782 and rs30099 both lie in the 
centromeric region of chromosome 5. rs4666451 lies on 2p, a region 
for which some evidence of linkage to breast cancer in families has 
been reported 5 . The 20 other SNPs showed no evidence of association 
in stage 3 (global P = 0. 1 1 ), suggesting that most of these associations 
from stages 1 and 2 were false positives. 



FGFR2 

The most significantly associated SNP, rs298 1582, lies within a 25 kb LD 
block almost entirely within intron 2 of FGFR2. We found no evidence 
of association with SN Ps elsewhere in the gene ( Fig. 3a) . In an attempt to 
identify a causal variant, we first identified the 19 common variants 
(m.a.f. > 0.05) in this block from HapMap CEU data. These were tagged 
( ? > 0.8) by 7 SNPs including rs2981 582. The additional tag SNPs were 
genotyped in the SEARCH study cases and controls. Multiple logistic 
regression analysis of these variants found no additional evidence for 
association after adjusting for rs2981582. Haplotype analysis of these 7 
SNPs indicated that multiple haplotypes carrying the minor (a) allele of 
rs2981582 were associated with an increased risk of breast cancer, imply- 
ing that the association was being driven by rs2981582 itself or a variant 
strongly correlated with it (Supplementary Table 4); 




Stage 1 
Stage 2 
ABCFS 
KConFab/AOC 
MCCS 

SAsecs 
CNioecs 

OGPS 
GENICA 
HBCS 

^CCP 

KBCP 
LUMCBCS 

HBCS 
NCIPBCS 
SEA ROD 

sees 

MC8CS 

USRTS 
MEC-W 
European 

MEC-J 
TBCS 
SBCP 
Asian 

TOTAL 



1.0 1.2 1.4 1.6 1.8 0.8 1.0 1.2 1.4 1.6 1.8 0.8 1.0 1.2 1.4 1.6 1.8 



0.8 1.0 1-2 1.4 1.6 1JI 01 1.0 1.2 1.4 1.6 1.8 



Figure 2 | Forest plots of the per-allele odds ratios for each of the five SNPs 
reachinggenome-wide significance, a, rs2981 582; b, rs3803662; c, rs8893 1 2; 
d, rsl 328161 5; and e, rs3817198. The x-axis gives the per-allele odds ratio. 
Each row represents one study (see Supplementary Table 2), with summary 
odds ratios for all European and all Asian studies, and all studies combined. 



The area of the square for each study is proportional to the inverse of the 
variance of the estimate. Horizontal lines represent 95% confidence 
intervals. Diamonds represent the summary odds ratios, with 95% 
confidence intervals, based on the stage 3 studies only. 



1089 



©2007 Nature Publishing Group 



Resequencing of this region in 45 subjects of European origin 
identified 29 variants that were strongly correlated with rs2981582 
(r>0.6) (http://cgwb.nci.nih.gov; Fig. 3b and Supplementary 
Tables 5-8). A subset of 14 variants tagged 27 of these in European 
(r 2 >0.95) and Asian (Korean) samples {?> 0.86). Two variants 
could not be genotyped reliably. This new tagging set was then gen- 
otyped in SEARCH and 3 studies from Asian populations; the Asian 
studies were included because the LD is weaker, providing greater 
power to resolve the causal variant (Fig. 3b, left panel). The strongest 
association was found with rs7895676. On the assumption that there 
is a single disease-causing allele, we calculated a likelihood for each 
variant. 21 SNPs (including rs2981582) had a likelihood ratio of <1/ 
100 relative to rs7895676, indicating that none of these are likely to be 
the causal variant (Supplementary Table 8). Six variants were too 
strongly correlated for their individual effects to be separated using 
a genetic epidemiological approach. Functional assays will be 
required to determine which is causally related to breast cancer risk. 

Intron 2 of FGFR2 shows a high degree of conservation in mam- 
mals, and contains several putative transcription-factor binding sites 
(http://genomequebec.mcgill.ca/PReMod) 27 , some of which lie in 
close proximity to the relevant SNPs. We therefore speculate that 
the association with breast cancer risk is mediated through regulation 
of FGFR2 expression. Of possible relevance is that only three of these 
variants (rsl0736303, rs2981578 and rs35054928) are within 
sequences conserved across all placental mammals (Fig. 3c and 



S S 



3 O) 
t- CNJ 



C\J J- CNJ r" OO 

11 I ill 




g £ 8 8 



n j jj r 

° o " » » 

ie 3 <•> »"» fc 

i*. in to o f» <n w 



c: <E <D «3 



S 8 



r tf) IS n N 5 O 

^ 3 2 




Supplementary Table 8). Of these, the disease associated allele of 
rsl0736303 generates a putative oestrogen receptor (ER) binding site. 
rs35054928 lies immediately adjacent to a perfect POU domain pro- 
tein octamer (Oct) binding site. However, multiple splice variants 
have been reported in FGFR2, and differential splicing might provide 
an alternative mechanism for the association. FGFR2 is a receptor 
tyrosine kinase that is amplified and overexpressed in 5-10% of 
breast tumours 28 " 30 . Somatic missense mutations of FGFR2 that are 
likely to be implicated in cancer development have also been demon- 
strated in primary tumours and cell lines of multiple tumour types 
(http://www.sanger.ac.ukygenetics/CGP/cosmic/) 30t3, . 

TNRC9/LOC643714 locus 

As two SNPs in the TNRC9ILOC643714 locus, rsl2443621 and 
rs805 1 542, both showed convincing evidence of association, we further 
evaluated this region by genotyping, in the SEARCH set, an additional 
19 SNPs tagging 101 common variants within the entire TNRC9 and 
LOC643714 genes, based on the HapMap CEU data. SNPs tagging the 
coding region of TNRC9 showed no evidence of association. The stron- 
gest association was observed with rs3803662, a synonymous coding 
SNP of LOC643714 that lies 8 kb upstream of TNRC9. This SNP was 
therefore genotyped in the stage 3 set (Table 2). Logistic regression 
analysis indicated that rs3803662 exhibited a stronger association with 
disease than other SNPs, and the associations with other SNPs were 
non-significant after adjustment for rs3803662. These results suggest 

^ Figure 3 | The FGFR2 locus, a, Map of the whole 

§ eg gj FGFR2 gene, viewed relative to common SNPs on 

HapMap. The gene is 1 26 kb long and in reverse 
3 '-5' orientation on chromosome 10. Exon 
positions are illustrated with respect to the 67 
SNPs with m.a.f. > 5% in HapMap CEU 
(therefore the map is not to physical scale). 
Numbered SNPs are those tested in the genome- 
wide study. SNPs in black were not significant in 
stage 1 . Those in red were significant at 
P < 0.0001 after stage 2. rsl05 10097 (orange) was 
significant in stage 1, but failed quality control in 
stage 2 owing to deviation from Hardy-Weinberg 
equilibrium. Squares indicate pairwise r 2 on a 
greyscale (black = 1, white = 0). Red circle 
indicates rs2981582. b, Resequenced 32 kb 
region, shown relative to SNPs in CEU with 
m.a.f. > 5%, showing pairwise LD for SNPs in 
HapMap CEU (left panel) and JPT/CHB (right 
panel). Red circle indicates rs2981582, shown in 
bold black, c, Sequence conservation of 32 kb 
region in five species, relative to human sequence 
(http://pipcline.lbl.gov/mcthods.shtml) 35 . Red 
circle indicates rs2981582. SNPs in grey are those 
used in the initial tagging of known common 
HapMap SNPs within the block SNPs in black 
are correlated with rs2981582 with r 2 > 0.6 in 
European samples. Six SNPs in red were those 
consistent with being the causative variant on the 
basis of the genetic data (not excluded at odds of 
100:1 relative to the SNP with the strongest 
association, rs7895676). 



a 



SN <D O JJ M 
£ s a i 



Opossum 



intron 2 



* Intron 1 



Intron 1 
Exon 2 5' UTR 



1090 



©2007 Nature Publishing Group 



that the causal variant is closely correlated with rs3803662. Four SNPs 
in the HapMap CEU data (rsl7271951, rsl362548, rs3095604 and 
rs4784227) that span LOC643714 and the 5' regulatory regions of 
TNRC9 are strongly correlated with rs3803662, and it therefore 
remains unclear in which gene the causative variant lies. TNRC9 con- 
tains a putative HMG (high mobility group) box motif, suggesting that 
it might act as a transcription factor. 

Pattern of risks 

We assessed in more detail, in the stage 3 data, the pattern of the 
risks associated with the five independent SNPs that reached an over- 
all P<1(T 7 : rs2981582 (FGFR2), rs3803662 (TNRC9/LOC643714) y 
rs889312 (MAP3K1), rsl3281615 (8q) and rs3817198 (LSP1). For each 
of these five SNPs, the minor allele in Europeans was associated with an 
increased risk of breast cancer in a dose-dependent manner, with a 
higher risk of breast cancer in homozygous than in heterozygous car- 
riers. Simple dominant and recessive models could be rejected for each 
SNP (all P=0.02 or less). There was a marked difference in allele 
frequencies between populations, with the risk-associated alleles of 
rs8051542, rs889312 and rsl3281615 being the major allele in Asian 
populations. The per allele odds ratio associated with rs2981582 was 
significantly smaller, though still elevated, in the Asian versus European 
populations (P= 0.04 for difference in odds ratio). This difference is 
consistent with the hypothesis that rs2981582 is not the functional 
variant at the FGFR2 locus, and was not seen for SNPs exhibiting stron- 
ger evidence in the fine-scale mapping. No other evidence for hetero- 
geneity in the per-allele odds ratio among studies was observed (Fig. 2). 

Three of the SNPs (rs2981582, rs3803662 and rs889312) also 
showed evidence of association with breast CIS (Supplementary 
Table 9) . For rs2981582 and rs3803662, the estimated odds ratios were 
greater for a diagnosis of breast cancer before age 40 years, but the 
trends by age were not statistically significant (Supplementary Table 
10). There was evidence of an association with family history of breast 
cancer for three SNPs: for rs2981582 (P= 0.02), rs3803662 (P= 0.03) 
and rsl3281615 (P = 0.05), the susceptibility allele was commoner in 
women with a first-degree relative with the disease than in those 
without (Supplementary Table 11). rs2981582 was also associated 
with bilaterality (P= 0.02). The associations with family history and 
bilaterality are to be expected for susceptibility loci, and are similar to 
previous observations for alleles in CHEK2 and ATM (refs 10, 12, 14). 

Discussion 

This study has identified five novel breast cancer susceptibility loci, 
and demonstrated conclusively that some of the variation in breast 
cancer risk is due to common alleles. None of the loci we identified 
had been previously reported in association studies. Most previously 
identified breast cancer susceptibility genes are involved in DNA 
repair, and many association studies in breast cancer have concen- 
trated on genes in DNA repair and sex hormone synthesis and meta- 
bolism pathways. None of the associations reported here appear to 
relate to genes in these pathways. It is notable that three of the five loci 
contain genes related to control of cell growth or to cell signalling, but 
only one (FGFR2) had a clear prior relevance to breast cancer. These 
results should, therefore, open up new avenues for basic research. 

Our results emphasize the critical importance of study size in gen- 
etic association studies. It is notable that none of the confirmed asso- 
ciations reached genome-wide significance after stage 1 and only one 
reached this level after stage 2. As most common cancers have similar 
familial relative risks to breast cancer, it is likely that similarly large 
studies will be required to identify common alleles for other cancers. 
The fine-scale mapping of the FGFR2 locus demonstrates that, even 
with a clear association, identification of the causative variant can be 
extremely problematic. However, the use of studies from multiple 
populations with different patterns of LD can substantially reduce 
the number of variants that need to be subjected to functional analysis. 

As these susceptibility alleles are very common, a high proportion of 
the general population are carriers of at-risk genotypes. For example, 



approximately 14% of the UK population and 19% of UK breast ■ 
cancer cases are homozygous for the rare allele at rs2981582. On the 
other hand, the increased risks associated with these alleles are rela- 
tively small-— on the basis of UK population rates, the estimated breast 
cancer risk by age 70 years for rare homozygotes at rs298 1 582 is 1 0.5%, 
compared to 6.7% in heterozygotes and 5.5% in common homozy- 
gotes. At this stage, it is unlikely that these SNPs will be appropriate for 
predictive genetic testing, either alone or in combination with each 
other. However, as further susceptibility alleles are identified, a com- 
bination of such alleles together with other breast cancer risk factors 
may become sufficiently predictive to be important clinically. 

On the basis of the relative risk estimates from stage 3, and assuming 
that the five most significant loci interact multiplicatively on disease 
risk, these loci explain an estimated 3.6% of the excess familial risk of 
breast cancer. On the basis of our staged design and the estimated 
distribution of linkage disequilibrium between the typed SNPs and 
those in HapMap, we estimate that the power to identify the five most 
significant associations at P< 10" 7 (rs2981582, rs3803662, rs889312, 
rsl3281615 and rs3817198) was 93%, 71%, 25%, 3% and 1% respect- 
ively. These estimates are uncertain, notably because the true coverage 
of HapMap SNPs is unknown. Nevertheless, these calculations indicate 
that the power to detect the two strongest associations was high, and 
suggest that there are likely to be few other common variants with a 
similar effect on variation in breast cancer risk to rs2981582. In con- 
trast, the low power to detect rsl3281615 and rs3817198 suggests that 
these variants may represent a much larger class of loci, each explaining 
of the order of 0.1% of the familial risk of breast cancer. An example of 
such a locus is provided by CASP8 D302H, which showed strong 
evidence of association in a previous large study 15 . This SNP was tested 
in stage 1 , but the association was missed because it did not reach the 
threshold for testing in stage 2. The excess of associations after stage 2 is 
also consistent with the existence of many such loci. In addition, 
because the coverage for SNPs with m.a.f. < 10% was low, many low 
frequency alleles may have been missed. The detection of further sus- 
ceptibility loci will require genome-wide studies with more complete 
coverage and using larger numbers of cases and controls, together with 
the combination of results across multiple studies. The present study 
demonstrates that common susceptibility loci can be reliably iden- 
tified, and that they may together explain an appreciable fraction of 
the genetic variance in breast cancer risk. 

METHODS SUMMARY 

Cases for stage 1 were identified through clinical genetics centres in the UK and a 
national study of bilateral breast cancer. Cases in stage 2 were drawn from a 
population-based study of breast cancer (SEARCH) 32 . Controls for stages 2 and 3 
were drawn from EPIC-Norfolk, a population-based study of diet and cancer 33 . 

Cases and controls for stage 3 were identified through case-control studies in 
Europe, North America, South-East Asia and Australia participating in the 
Breast Cancer Association Consortium (Supplementary Table 2) 3 \ 

Genotyping for stages 1 and 2 was conducted using high-density oligonucleo- 
tide microarrays. For the main analyses, we excluded samples called on <80% of 
SNPs in either stage. We also excluded SNPs that achieved a call rate of ^90% in 
stage 1 and <95% in stage 2, and SNPs whose frequency deviated from Hardy- 
Weinberg equilibrium in controls at P < 0.00001 . Genotyping for stage 3, and for. 
the fine-scale mapping of the FGFR2 locus, was conducted using either a 5' 
nuclease assay (Taqman, Applied Biosystems) or MALDI-TOF mass spectro- 
metry using the Sequenom iPLEX system. For each centre, we excluded any 
sample called on <80% of SNPs, and any SNP with a call rate of ^95% or a 
deviation from Hardy-Weinberg equilibrium in controls at P< 0.00001. Tests 
of association were 1 d.f. Cochran-Armitage tests, stratified for stage, centre and 
ethnic group (European or Asian). Odds ratios for each SNP were estimated 
using stratified logistic regression, using the stage 3 data only. 

Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 

Received 9 February; accepted 30 April 2007. 

Published online 27 May 2007; corrected 28 June 2007 (details online). 

1. Collaborative Group on Hormonal Factors in Breast Cancer. Familial breast 
cancer: Collaborative reanalysis of individual data from 52 epidemiological 

1091 



©2007 Nature Publishing Group 



studies including 58 209 women with breast cancer and 101 986 women without 
the disease. Lancet 358, 1389-1399 (2001). 

2. Miki, Y. et al. A strong candidate for the breast and ovarian-cancer susceptibility 
gene BRCA1. Science 266, 66-71 (1994). 

3. Wooster, R. et al. Identification of the breast cancer susceptibility gene BRCA2. 
Nature 378, 789-792 (1995). 

4. Antoniou, A. et ai Average risks of breast and ovarian cancer associated with 
mutations in BRCAlor BRCA2 detected in case series unselected for family history: 
A combined analysis of 22 studies. Am. J. Hum. Genet 72, 1117-1130 (2003). 

5. Smith, P. et al. A genome wide linkage search for breast cancer susceptibility 
genes. Genes Chromosom. Cancer 45, 646-655 (2006). 

6. Pharoah, P. D. P. et ai Polygenic susceptibility to breast cancer and implications 
for prevention. Nature Genet 31, 33-36 (2002). 

7. Antoniou, A. C, Pharoah, P. D. P., Smith, P. & Easton, D. F. The BOADICEA model 
of genetic susceptibility to breast and ovarian cancer. Br. J. Cancer 91, 1580-1590 
(2004). 

8. Rahman, N. et al PALB2, which encodes a BRCA2-interacting protein, is a breast 
cancer susceptibility gene. Nature Genet. 39, 165-167 (2007). 

9. Thompson, D. et al. Cancer risks and mortality in heterozygous ATM mutation 
carriers. J. Natl Cancer Inst. 97, 813-822 (2005). 

10. Meijers-Heijboer, H. et ai Low-penetrance susceptibility to breast cancer due to 
CHEK2* 1100delC in noncarriers of BRCA1 or BRCA2 mutations. Nature Genet 31, 
55-59 (2002). 

11. Erkko. H. et al. A recurrent mutation in PALB2 in Finnish cancer families. Nature 
446, 316-319 (2007). 

12. Renwick, A. et ai ATM mutations that cause ataxia-telangiectasia are breast 
cancer susceptibility alleles. Nature Genet 38, 873-875 (2006). 

13. Seal, S. et al. Truncating mutations in the Fanconi anemia J gene BRIM are low- 
penetrance breast cancer susceptibility alleles. Nature Genet. 38, 1239-1241 (2006). 

14. The CHEK2 Breast Cancer Case-Control Consortium. CHEK2*1100delC and 
susceptibility to breast cancer: A collaborative analysis involving 10,860 breast 
cancer cases and 9,065 controls from ten studies. Am. J. Hum. Genet 74, 
1175-1182 (2004). 

15. Cox, A. et ai A common coding variant in CASP8 is associated with breast cancer 
risk.' Nature Genetics 39, 352-358 (2007); corrigendum 39, 688 (2007). 

16. Easton, D. F. How many more breast cancer predisposition genes are there? 
Breast Cancer Res. 1, 1-4 (1999). 

17. Kruglyak, L. & Nickerson, D. A. Variation is the spice of life. Nature Genet 27, 
234-236(2001). 

18. Hinds, D. A. et al. Whole-genome patterns of common DNA variation in three 
human populations. Science 307, 1072-1079 (2005). 

19. Satagopan, J. M., Verbel, D. A., Venkatraman, E. S., Offit, K. E. & Begg, C. B. Two- 
stage designs for gene-disease association studies. Biometrics 58, 163-170 (2002). 

20. Antoniou, A. C. & Easton, D. F. Polygenic inheritance of breast cancer: Implications 
for design of association studies. Genet. Epidemiol. 25, 190-202 (2003). 

21. Altshuler, D. ef ai A haplotype map of the human genome. Nature 437, 1299-1320 

(2005) . . 

22. Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 
997-1004 (1999). 

23. Thomas, D. C, Haile, R. W. & Duggan, D. Recent developments in genomewide 
association scans: A workshop summary and review. Am. J. Hum. Genet 77, 
337-345 (2005). 

24. Amundadottir, L. T. et ai A common variant associated with prostate cancer in 
European and African populations. Nature Genet 38, 652-658 (2006). 

25. Yeager, M. et ai Genome-wide association study of prostate cancer identifies a 
second' risk locus at 8q24. Nature Genet 39, 645-649 (2007). 

26. Gudmundsson, J. etal. Genome-wide association study identifies a second 
prostate cancer susceptibility variant at 8q24. Nature Genet 39, 631-637 (2007). 

27. Ferretti, V. et ai PReMod: a database of genome-wide mammalian cis-regulatory 
module predictions. Nucleic Acids Res. 35, D122-D126 (2007). 

28. Moffa, A. B., Tannheimer, S. L. & Ethier, S. P. Transforming potential of 
alternatively spliced variants of fibroblast growth factor receptor 2 in human 
mammary epithelial cells. Mol. Cancer Res. 2,-643-652 (2004): 

29. Adnane, J. et al. Bek and Fig, 2 receptors to members of the Fgf family, are 
amplified in subsets of human breast cancers. Oncogene 6, 659-663 (1991). 

30. Jang, J. H., Shin, K. H. & Park, J. G. Mutations in fibroblast growth factor receptor 2 
and fibroblast growth factor receptor 3 genes associated with human gastric and 
colorectal cancers. Cancer Res. 61, 3541-3543 (2001). 

31. Greenman, C. et ai Patterns of somatic mutation in human cancer genomes. 
Nature 446, 153-158 (2007). 

32. Lesueur, F. et ai Allelic association of the human homologue of the mouse 
modifier Ptprj with breast cancer. Hum. Mol. Genet 14, 2349-2356 (2005). 

33. Day, N. et ai EPIC-Norfolk: Study design and characteristics of the cohort. Sr. J. 
Cancer 80, 95-103 (1999). 

34. Breast Cancer Association Consortium. Commonly studied SNPs and 

breast cancer: Negative results from 12,000 - 32,000 cases and controls from 
the Breast Cancer Association Consortium. J. Natl Cancer Inst 98, 1382-1396 

(2006) . 

35. Hubbard, T. et ai The Ensembl genome database project. Nucleic Acids Res. 30, 
38-41 (2002). 

Supplementary Information is linked to the online version of the paper at 
www.nature.com/nature. 



Acknowledgements The authors thank the women who took part in this research, 
and all the funders and support staff who made this study possible. The principal 
funding for this study was provided by Cancer Research UK. Detailed 
acknowledgements are provided in Supplementary Information. 

Author Contributions D.F.E., A.M.D., P.D.P.P., D.R.C. and B.AJ.P. designed the 
study and obtained financial support. D.G.B. and D.R.C. directed the genotyping of 
stages 1 and 2. D.F.E. and D.T. conducted the statistical analysis. K.A.P. and A.M.D. 
coordinated the genotyping for stage 3 and the fine-scale mapping of the FGFR2 
and TNRC9 loci. J.P.S. and J.Z. performed resequencing and analysis of the FGFR2 
locus. K.A.P., S.A., C.S.H., R.B., C.A.H., L.K.K., B.E.H.; L.L.M., P.B., S.S., V.G., F.O., C-Y. 
S., P-E. W. and H-C.W. conducted genotyping for the fine-scale mapping. R.L, J.M., 
KF. and K.B.M. provided bioinformatics support. D.E., D.G.E., J.P., O.F., N.J., S.S.. 
M.R.S. and N.R. coordinated the studies used in stage 1. N.W. and N.E.D. 
coordinated the EPIC study used in stages 1 and 2. The remaining authors 
coordinated the studies in stage 3 and undertook genotyping in those studies. 
D.F.E. drafted the manuscript, with substantial contributions from K.A.P., A.M.D., 
P.D.P.P. and B.AJ.P. All authors contributed to the final paper. 

Author Information Reprints and permissions information is available at 
www.nature.com/reprints. The authors declare no competing financial interests. 
Correspondence and requests for materials should be addressed to D.F.E. 
(d.easton@srl.cam,ac.uk). 



Author affiliations: 'CR-UK Genetic Epidemiology Unit, Department of Public Health 
and Primary Care and, department of Oncology, University of Cambridge, Cambridge 
CB1 8RN, UK. 3 Perlegen Sciences, Inc., 2021 Stierlin Court, Mountain View, California 
94043, USA. ^Laboratory of Population Genetics, US National Cancer Institute, 
Bethesda, Maryland 20892, USA. 5 EPIC, Department of Public Health and Primary 
Care, University of Cambridge, Cambridge CB1 8RN, UK. 6 MRC Dunn Clinical Nutrition 
Centre, Cambridge CB2 OXY, UK. 7 Cancer Research UK Cambridge Research Institute. 
Cambridge CB2 ORE, UK. department of Preventive Medicine, Keck School of 
Medicine, University of Southern California, Los Angeles, California 90033, USA. 
epidemiology Program, Cancer Research Center of Hawaii, University of Hawaii, 
Honolulu, Hawaii 96813, USA. ^International Agency for Research on Cancer, 150 
Cours Albert Thomas, Lyon 69008, France. "National Cancer Institute, Bangkok 
10400, Thailand. ^Institute of Biomedical Sciences, Academia Sinica. Taipei 11529, 
Taiwan. 13 Wessex Clinical Genetics Service, Princess Anne Hospital, Southampton 
S016 5YA, UK. 14 Regional Genetic Service, St Mary's Hospital, Manchester M13 OJH, 
UK. 15 London School of Hygiene and Tropical Medicine, London WC1E 7HT, UK, and 
Institute of Cancer Research, Sutton, Surrey SM2 5NG. UK. 16 Breakthrough Breast 
Cancer Research Centre, London SW3 6JB, UK. "Section of Cancer Genetics, Institute 
of Cancer Research, Sutton, Surrey SM2 5NG, UK. ,8 Cancer Genome Project, 
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, 
Cambridge CB10 ISA, UK. ^Queensland Institute of Medical Research. Brisbane, 
Queensland 4006, Australia. 20 Departments of Clinical Biochemistry and 21 Breast 
Surgery, Herlev and Bispebjerg University Hospitals, University of Copenhagen, 
DK-2730 Herlev, Denmark. 22 Division of Cancer Epidemiology and Genetics, National 
Cancer Institute, Rockville, Maryland 20852, USA. ^Advanced Technology Center, 
National Cancer Institute. Gaithersburg, Maryland 20877, USA. 24 Cancer Center and 
M. Sklodowska-Curie Institute of Oncology, Warsaw 02781, Poland. 25 Nofer Institute 
of Occupational Medicine, Lodz 90950, Poland. "Departments of Obstetrics and 
Gynecology, and "Department of Oncology, Helsinki University Central Hospital, 
Helsinki 00029, Finland. 28 Seoul National University College of Medicine, Seoul 
151-742, Korea. 29 National Cancer Center, Goyang 411-769, Korea. 30 Ulsan University 
College of Medicine, Ulsan 680-749. Korea. 31 Program in Molecular and Genetic 
Epidemiology, Harvard School of Public Health, 677 Huntington Ave., Boston, 
Massachusetts 02115, USA. 32 Channing Laboratory, Brigham and Women's Hospital 
and Harvard Medical School. 181 Longwood Ave.. Boston, Massachusetts 02115. USA. 
33 Department of Medical Epidemiology and Biostatistics, Karolinska Institute, 
Stockholm SE-171 77, Sweden. 34 Population Genetics, Genome Institute of Singapore, 
60 Biopolis Street, Singapore 138672, Republic of Singapore. 35 Department of 
Radiation Oncology and ^Department of Gynecology and Obstetrics, Hannover 
Medical School, D-30625 Hannover, Germany. 37 Department of Surgery and 
38 Department of Medical Decision Making and 39 Departments of Human Genetics 
and Pathology, Leiden University Medical Center, Albinusdreef 2, 2333 ZA Leiden, the 
Netherlands. 40 Family Cancer Clinic, Department of Medical Oncology, Erasmus 
MC-Daniel den Hoed Cancer Center, Groene Hilledijk 301, 3075 EA Rotterdam, the 
Netherlands. 41 Radiation Epidemiology Branch, Division of Cancer Epidemiology and 
Genetics, National Cancer Institute, NIH, DHHS, Bethesda, Maryland 20892, USA. 
^Environmental Health Sciences, University of Minnesota. Minneapolis. Minnesota 
55455, USA. 43 lnstitute for Cancer Studies and 44 Academic Unit of Surgical Oncology, 
Sheffield University Medical School, Sheffield S10 2RX, UK. 45 Mayo Clinic College of 
Medicine, Rochester, Minnesota 55905, USA. 4b VU University Medical Center, 1007 
MB Amsterdam, the Netherlands. 47 Department of Clinical Genetics and^ 48 lnternal 
Medicine, Erasmus University, Rotterdam NL-3015-GE, the Netherlands. 49 Spanish 
National Cancer Centre (CNIO), Madrid E-28029, Spain. 50 Centre for Molecular, 
Environmental, Genetic and Analytical Epidemiology. University of Melbourne, 
Carlton, Victoria 3053, Australia. 51 Department of Preventive and Social Medicine, 
University of Otago, Dunedin 9001, New Zealand. 52 Cancer Epidemiology Centre. 
Cancer Council Victoria, Carlton, Victoria 3053. Australia. "Genetic Epidemiology 



1092 



©2007 Nature Publishing Group 



Laboratory, Department of Pathology, University of Melbourne, Parkville, Victoria 
3052, Australia. 54 Dr. Margarete Fischer-Bosch-Institute of Clinical Pharamcology, 
70376 Stuttgart and University of Tuebingen, 72074 Tuebingen, Germany. 
55 Deutsches Krebsforschungszentrum, Heidelberg 69120, Germany. 56 Evangelische 
Kliniken Bonn gGmbH Johanniter Krankenhaus, 53113 Bonn, Germany. 57 Peter 
MacCallum Cancer Centre, Melbourne, Victoria 3002, Australia. b8 lnsitute of Clinical 
Medicine, Pathology and Forensic Medicine, University of Kuopio, Kuopio FIN-70210; 
Finland. 59 Departments of Oncology and Pathology, University Hospital of Kuopio, 
Kuopio FIN-70211, Finland. 60 Department of Oncology, Vaasa Central Hospital, Vaasa 
65130, Finland. 



The SEARCH collaborators Craig Luccarini 1 , Don Conroy 1 , Mitul Shah 1 , Hannah 
Munday 1 , Clare Jordan 1 , Barbara Perkins 1 , Judy West 1 , Karen Redman 1 & Kristy Driver 1 . 
kConFab Morteza Aghmesheh 2 , David Amor 3 , Lesley Andrews 4 , Yoland Antill 5 , Jane 
Armes 6 , Shane Armitage 7 , Leanne Arnold 7 , Rosemary Balleine 8 , Glenn Begley 9 , John 
Beilby 10 , Ian Bennett 11 , Barbara Bennett 4 , Geoffrey Berry 12 , Anneke Blackburn 13 , 
Meagan Brennan 14 , Melissa Brown 15 , Michael Buckley 16 , Jo Burke 17 , Phyllis Butow 8 , 
Keith Byron 19 , David Callen 20 , Ian Campbell 21 , Georgia Chenevix-Trencr/ 2 , Christine 
Clarke 23 , Alison Colley 24 , Dick Cotton 2S , Jisheng Cui 26 , Bronwyn Culling 27 , Margaret 
Cummings 28 , Sarah-Jane Dawson 5 , Joanne Dixon 29 , Alexander Dobrovic 30 , Tracy 
Dudding 31 , Ted Edkins 32 , Maurice Eisenbruch 33 , Gelareh Farshid 34 , Susan Fawcett 35 , 
Michael Field 36 , Frank Firgaira 37 , Jean Fleming 38 , John Forbes 39 , Michael 
Friedlander 40 , Clara Gaff 41 , Mac Gardner 41 , Mike Gattas 42 , Peter George 43 , Graham 
Giles 44 , Grantley Gill 45 , Jack Goldblatt 46 , Sian Greening 47 , Scott Grist 37 , Eric Haan 48 , 
Marion Harris 49 , Stewart Hart so , Nick Hayward 22 , John Hopper 51 , Evelyn Humphrey 17 
Mark Jenkins 52 , Alison Jones 7 , Rick Kefford 53 , Judy Kirk 54 , James Kollias 55 , Sergey 
Kovalenko 56 , Sunil Lakhani 57 , Jennifer Leary 54 , Jacqueline Lim 58 , Geoff Lindeman 59 , 
Lara Upton 60 . Liz Lobb 61 , Mariette Maclurcan 62 , Graham Mann 23 , Deborah Marsh 63 , 
Margaret McCredie 64 , Michael McKay 49 , Sue Anne McLachlan 6S , Bettina Meiser 4 , 
Roger Milne 26 , Gillian Mitchell 49 , Beth Newman 66 , Imelda O'Loughlin 67 Richard 
Osborne 51 , Lester Peters 68 , Kelly Phillips 5 , Melanie Price 62 , Jeanne Reeve 69 . Tony 
Reeve 70 , Robert Richards 71 , Gina Rinehart 72 , Bridget Robinson 73 , Barney Rudzki 74 , . 
Elizabeth Salisbury 75 , Joe Sambrook 21 , Christobel Saunders 76 , Clare Scott 5 , Elizabeth 
Scott 77 , Rodney Scott 31 , Ram Seshadri 37 , Andrew Shelling 78 , Melissa Southey 26 , 
Amanda Spurdle 22 , Graeme Suthers 48 , Donna Taylor 79 , Christopher Tennant S8 , 
Heather Thorne 21 , Sharron Townshend 46 , Kathy Tucker 4 , Janet Tyler 4 , Deon Venter 80 , 
Jane Visvader 81 , Ian Walpole 46 , Robin Ward 82 , Paul Waring 30 , Bev Warner 83 , Graham 
Warren 67 , Elizabeth Watson 67 , Rachael Williams 84 , Judy Wilson 85 , Ingrid Winship 69 
& Mary Ann Young 49 . AOCS Management Group David Bowtell 86 , Adele Green 22 , 
Anna deFazio 87 , Georgia Chenevix-Trench 22 , Dorota Gertig 51 & Penny Webb 22 . 

Consortia affiliations: department of Oncology, University of Cambridge, Cambridge 
CB1 8RN, UK. 2 Oncology Research Centre, Prince of Wales Hospital, Randwick, New 
South Wales 2031, Australia. 3 Genetic Health Services Victoria, Royal Children's 
Hospital, Melbourne, Victoria 3050, Australia. ^Hereditary Cancer Clinic, Prince of 
Wales Hospital, Randwick, New South Wales 2031, Australia, department of 
Haematology and Medical Oncology, Peter MacCallum Cancer Centre, St Andrews 
Place. East Melbourne, Victoria 3002. Australia. 6 Anatomical Pathology, Royal Women's 
Hospital, Carlton, Victoria 3053, Australia. 7 Molecular Genetics Laboratory, Royal 
Brisbane and Women's Hospital, Herston, Queensland 4029, Australia, departments 
of Translational and Medical Oncology. Westmead Hospital, Westmead, New South 
Wales 2145, Australia. 9 Cancer Biology Laboratory, TVW Institute for Child Health 
Research. Subiaco, Western Australia 6008, Australia. 10 Pathology Centre, Queen 
Elizabeth Medical Centre. Nedlands, Western Australia 6009, Australia. "Silverton 
Place, 101 Wickham Terrace, Brisbane, Queensland 4000, Australia. 12 Department of 
Public Health and Community Medicine, University of Sydney, Sydney, New South Wales 
2006. Australia. 13 John Curtin School of Medical Research, Australian National 
University, PO Box 334, Canberra, Australian Capital Territory 2601, Australia. 14 NSW 
Breast Cancer Institute, PO Box 143, Westmead, New South Wales 2145, Australia. 
15 Department of Biochemistry, University of Queensland, St. Lucia. Queensland 4072, 
USA. 16 Molecular and Cytogenetics Unit, Prince of Wales Hospital, Randwick, New South 
Wales 2031, Australia. 17 Royal Hobart Hospital. GPO Box 1061L, Hobart. Tasmania 7001, 
Australia. 18 Medical Psychology Unit, Royal Prince Alfred Hospital. Camperdown, New 
South Wales 2204, Australia. ,9 Australian Genome Research Facility, Walter & Eliza Hall 
Medical Research Institute, Royal Melbourne Hospital, Parkville, Victoria 3050, 
Australia. 20 Dame Roma Mitchell Cancer Research Laboratories, University of Adelaide/ 
Hanson Institute, PO Box 14, Rundle Mall, South Australia 5000, Australia. 2, Peter 
MacCallum Cancer Centre, St Andrew's Place, East Melbourne, Victoria 3002, Australia. 
"Queensland Institute of Medical Research, Herston, Queensland 4006. Australia. 
23 Westmead Institute for Cancer Research, University of Sydney, Westmead Hospital 
.Westmead. New. South Wales 2145, Australia. ^Department of Clinical Genetics, 
Liverpool Health Service, PO Box 103, Liverpool, New South Wales 2170, Australia. 
25 Mutation Research Centre, St Vincent's Hospital, Victoria Parade, Fitzroy, Victoria 
3065, Australia. 26 Centre for Genetic Epidemiology, The University of Melbourne, Level 
2 723 Swanston Street, Carlton, Victoria 3053, Australia. 27 Molecular and Clinical 
Genetics. Level 1 Building 65. Royal Prince Alfred Hospital. Camperdown, New South 
Wales 2050, Australia. 28 Department of Pathology, University of Queensland Medical 
School, Herston, New South Wales 4006. Australia. 29 Central Regional Genetic Services, 



Wellington Hospital, Private bag 7902, Wellington South 6039. New Zealand. 
30 Molecular Department of Pathology, Peter MacCallum Cancer Centre, St Andrew's 
Place, East Melbourne, Victoria 3002, Australia. 31 Hunter Genetics, Hunter Area Health " 
Service, Waratah, New South Wales 2310, Australia. 32 Clinical Chemistry. Princess 
Margret Hospital for Children, Box D184. Perth, Western Australia 6001, Australia. 
33 Department of Multicultural Health, University of Sydney, New South Wales 2052; 
Australia: 34 Tissue Pathology, Institute of Medical & Veterinary Science, Adelaide. South 
Australia 5000, Australia. 3S Family Cancer Clinic. Monash Medical Centre, Clayton. 
Victoria 3168, Australia. 36 Faculty of Medicine, Royal North Shore Hospital, Vindin 
House, St Leonards, New South Wales 2065, Australia. 37 Department of Haematology, 
Flinders Medical Centre, Bedford Park, South Australia 5042. Australia. 38 Eskitis 
Institute of Cell & Molecular Therapies, School of Biomolecular and Biomedical Sciences, 
Griffith University, Nathan, Queensland 4111, Australia. 39 Surgical Oncology, University 
of Newcastle, Newcastle Mater Hospital, Waratah, New South Wales 2298, Australia. 
40 Department of Medical Oncology, Prince of Wales Hospital, Randwick, New South 
Wales 2031, Australia. 41 Victorian Clinical Genetics Service, Royal Melbourne Hospital. 
Parkville, Victoria 3052, Australia. 42 Queensland Clinical Genetic Service, Royal 
Children's Hospital, Bramston Terrace, Herston, Queensland 4020, Australia. 43 Clinical 
Biochemistry Unit, Canterbury Health Labs, PO Box 151, Christchurch 8140, New 
Zealand. 44 Cancer Epidemiology Centre, The Cancer Council Victoria, 1 Rathdowne 
Street, Carlton, Victoria 3053, Australia. 45 Department of Surgery, Royal Adelaide 
Hospital, Adelaide, South Australia 5000, Australia. 46 Genetic Services of WA, King 
Edward Memorial Hospital, 374 Bagot Road, Subiaco, Western Australia 6008, . 
Australia. 47 Wollongong Hereditary Cancer Clinic, Wollongong Public Hospital, Private 
Mail Bag 8808, South Coast Mail Centre. New South Wales 2521. Australia. 
48 Department of Medical Genetics, Women's and Children's Hospital, North Adelaide, 
South Australia 5006, Australia. 49 Familial Cancer Clinic, Peter MacCallum Cancer 
Centre, St Andrew's Place, East Melbourne. Victoria 3002, Australia. S0 Breast and 
Ovarian Cancer Genetics, Monash Medical Centre, 871 Centre Road, Bentleigh East, 
Victoria 3165, Australia. 51 Centre for Molecular Environmental, Genetic & Analytic 
Epidemiology, University of Melbourne, Melbourne, Victoria 3010. Australia. S2 Schoolof 
Population Health, The University of Melbourne, 723 Swanston Street, Carlton, Victoria 
3053, Australia. 53 Medical Oncology, Westmead Hospital, Westmead. New. South 
Wales 2145, Australia. 54 Familial Cancer Service, Department of Medicine. Westmead 
Hospital, Westmead, New South Wales 2145, Australia. 55 Breast Endocrine and Surgical 
Unit, Royal Adelaide Hospital, North Terrace, South Australia 5000, Australia. 
56 Molecular Pathology Department, Southern Cross Pathology, Monash Medical Centre, 
Clayton, Victoria 3168, Australia. 57 Molecular and Cellular Pathology, The University of 
Queensland, Herston, Queensland 4006, Australia. ^Department of Psychological 
Medicine, Royal North Shore Hospital, St Leonards, New South Wales 2065, Australia. 
59 Breast Cancer Laboratory. Walter and Eliza Hall Institute, PO Royal Melbourne 
Hospital, Parkville, Victoria 3050, Australia. 60 Medical Oncology and Clinical 
Haematology Unit, Western Hospital, Footscray, Victoria 3011, Australia. 61 WA Centre 
for Cancer, Edith Cowan University. Churchlands, Western Australia 6018. Australia. 
"Department of Psychological Medicine, University of Sydney, New South Wales 2006, 
Australia. 63 Kolling Institute of Medical Research, Royal North Shore Hospital, St 
Leonards, New South Wales 2065, Australia. 6 *Cancer Epidemiology Research Unit. 
NSW Cancer Council, 153 Dowling Street, Woolloomooloo, New South Wales 2011, 
Australia. "Department of Oncology. St Vincent's Hospital, 41 Victoria Parade, Fitzroy, 
Victoria 3065, Australia. 66 School of*Public Health, Queensland University of 
Technology. Victoria Park, Kelvin Grove, Queensland 4059, Australia. 67 St Vincent's 
Breast Clinic, PO Box 4751, Toowoomba; Queensland 4350, Australia. 68 Radiation 
Oncology, Peter MacCallum Cancer Centre. St Andrew's Place, East Melbourne, Victoria 
3002, Australia. 69 Genetic Services, Auckland Hospital. Private Bag 92024, Auckland 
1142. New Zealand. 70 Cancer Genetics Laboratory. University of Otago, PO Box 56. 
Dunedin 9054, New Zealand. 71 Department of Cytogenetics and Molecular Genetics, 
Women and Children's Hospital, Adelaide, South Australia 5006, Australia. 72 Hancock 
Family Breast Cancer Foundation. PO Locked Bag 2. West Perth, Western Australia 
6005, Australia. 73 0ncology Service, Christchurch Hospital, Private Bag 4710, 
Christchurch 8140, New Zealand.' 74 Molecular Pathology Institute of Medical and 
Veterinary Science, Frome Road, Adelaide, South Australia 5000, Australia. 75 Section of 
Cytology, Institute of Clinical Pathology and Medical Research, Westmead Hospital, 
Westmead, New South Wales 2145. Australia. 76 School of Surgery and Pathology, QE11 
Medical Centre, M block 2nd Floor, Nedlands, Western Australia 6907. Australia. 
77 South View Clinic, Suite 13. Level 3 South Street, Kogarah, New South Wales 2217, 
Australia. 78 Department of Obstetrics and Gynaecology, University of Auckland, Private 
Bag 92019, Auckland 1142, New Zealand. 79 Department of Radiology, Royal Perth 
Hospital, Box X2213, Perth 6011, Western Australia, Australia. 80 Murdoch Institute, 
Royal Children's Hospital, Parkville, Victoria 3050, Australia. 8, Molecular Genetics of 
Cancer Division, Walter & Eliza Hall Medical Research Institute, Royal Melbourne 
Hospital. Parkville, Victoria 3050. Australia. ^Department of Medical Oncology, St 
Vincents Hospital, Darlinghurst, New South Wales 2010, Australia. 83 Cabrini Hospital, 
183 Wattletree Road, Malvern, Victoria 3144, Australia. E<i Family Cancer Clinic, St 
Vincent's Hospital, Darlinghurst, New South Wales 2010, Australia. H5 Medical 
Psychology Research Unit, Royal North Shore Hospital, St Leonards. New South Wales 
2065, Australia. 86 Cancer Genomics & Biochemistry Laboratory, Peter MacCallum 
Cancer Centre, St Andrew's Place, East Melbourne. Victoria 3002. Australia. 
"Obstetrics & Gynaecology, Westmead Hospital. University of Sydney, New South 
Wales 2006, Australia. 



1093 



©2007 Nature Publishing Group 



METHODS 

Subjects. Cases in stage 1 were identified through clinical genetics centres in 
Cambridge (n = 91), Manchester (96) and Southampton (136), and a national 
study of bilateral breast cancer (85). Cases were women diagnosed with invasive 
breast cancer under the age of 60 years who had a family history score of at least 2, 
where the score was computed as the total number of first-degree relatives plus 
half the number of second-degree relatives affected with breast cancer. The score 
for women with bilateral breast cancer was increased by 1, so that women were 
eligible if they were diagnosed with bilateral breast cancer and had one affected 
first-degree relative. Cases known to carry a BRCA1 or BRCA2 mutation were 
excluded. Controls were selected from the EPIC-Norfolk study, a population- 
based cohort study of diet and cancer based in Norfolk, East Anglia, UK 33 . 
Controls were chosen to be women aged over 50 years and free of cancer at 
the time of entry. Genotyping was attempted on 408 cases, plus 32 duplicate 
case samples, and 400 controls. For the analysis in Table 1, 54 samples with 
genotype call rates <80% were excluded, so the final analyses were based on. 
390 cases and 364 controls. The minimum genotype call rate for the remaining 
samples was 89%. The overall genotype discordance rate between duplicate 
samples in stage 1 was 0.01%. 

For stage 2, invasive breast cancer cases were drawn from SEARCH, a popu- 
lation-based study of cancer in East Anglia 32 . Controls were women selected 
from the EPIC-Norfolk study, as previously described 33 . Eighty-eight subjects 
who were also genotyped in stage 1, and 35 controls who subsequently developed 
breast cancer and were also in the case series, were excluded from the analysis, 
leaving 3,990 breast cancer cases and 3,916 controls, plus five duplicates. The 
overall rate of discordance of genotypes between duplicate samples in stage 2 was 
0.008%. 

Twenty-one additional studies were included in stage 3 (see Supplementary 
Table 2). These studies participated through the Breast Cancer Association 
Consortium, an ongoing collaboration among investigators conducting case- 
control association studies in breast cancer 15 - 33 . All studies provided information 
on disease status (invasive breast cancer, carcinoma in situ or control), age at 
diagnosis/observation, ethnic group, first-degree family history of breast cancer 
and bilaterality of breast cancer. One further study (Breast Cancer Study of 
Taiwan) was included in the fine-scale mapping of the FGFR2 locus. 
Genotyping. For stage 1, genotyping was performed on 200 ng DNA that was 
first subjected to whole genome amplification using Multiple Displacement 
Amplification (MDA) 36 . Samples were then genotyped for a set of 266,732 
SNPs using high-density oligonucleotide, photolithographic microarrays at 
Perlegen Sciences. For stage 2, genotyping was performed using 2.5 ug genomic 
DNA. These samples were genotyped for a set of 13,023 SNPs selected on the 
basis of the stage 1 results, using a custom designed oligonucleotide array. For 
both stages, each SNP was interrogated by 24 25-mer oligonucleotide probes 
synthesized by photolithography on a glass substrate. The 24 features comprise 4 
sets of 6 features interrogating the neighbourhoods of SNP reference and alterna- 
tive alleles on forward and reference strands. Each allele and strand is represented 
by five offsets: -2, - 1, 0, 1 and 2 indicating the position of the SNP within the 
25-mer, with zero being at the thirteenth base. At offset 0 a quartet was tiled, 
which included the perfect match to reference and alternative SNP alleles, and 
the two remaining nucleotides as mismatch probes. When possible, the mis- 
match features were selected as a purine nucleotide substitution for a purine 
perfect match nucleotide and a pyrimidine nucleotide substitution for a pyri- 
midine perfect match nucleotide. Thus, each strand and allele tiling consisted of 
6 features comprising five perfect match probes and one mismatch. 

Individual genotypes were determined by clustering all SNP scans in the two- 
dimensional space defined by reference and alternative trimmed mean intens- 
ities, corrected for background. Allele frequencies were approximated using the 
intensities collected from the high-density oligonucleotide arrays. An SNP's 
allele frequency, p, was estimated as the ratio of the relative amount of the 
DNA with reference allele to the total amount of DNA. The p value was com- 
puted from the trimmed mean intensities of perfect match features, after sub- 
tracting a measure of background computed from trimmed means of intensities 
of mismatch features. The trimmed mean disregarded the highest and the lowest 
intensity from the five perfect match intensities before computing the arithmetic 
mean. For the mismatch features, the trimmed mean is the individual intensity of 
the specified mismatch feature. 

The genotype clustering procedure was an iterative algorithm developed as a 
combination of K-means and constrained multiple linear regressions. The 
K-means at each step re-evaluated the cluster membership representing distinct 
diploid genotypes. The multiple linear regressions minimized the variance in p 
within each cluster while optimizing the regression lines' common intersect. The 
common intersect defined a measure of common background that was used to 
adjust the allele frequencies for the next step of K-means. The K-means and 
multiple linear regression steps were iterated until the cluster membership and 



background estimates converged. The best number of clusters was selected by 
maximizing the total likelihood over the possible cluster counts of 1, 2 and 3 
(representing the combinations of the three possible diploid genotypes). The 
total likelihood was composed of data likelihood and model likelihood. The data 
likelihood was determined using a normal mixture model for the distribution of 
p around the cluster means. The model likelihood was calculated using a prior 
distribution of expected cluster positions, resulting in optimal p positions of 0.8 
for the homozygous reference cluster, 0.5 for the heterozygous cluster and 0.2 for 
the homozygous alternative cluster. 

A genotyping quality metric was compiled for each genotype from 15 input 
metrics that described the quality of the SNP and the genotype. The genotyping 
iquality metric correlated with a probability of having a discordant call between 
the Perlegen platform and outside genotyping platforms (that is, non-Perlegen 
HapMap project genotypes). A system of 10 bootstrap aggregated regression, 
trees was trained using an independent data set of concordance data between 
Perlegen genotypes and HapMap project genotypes. The trained predictor was 
then used to predict the genotyping quality for each of the genotypes in this data 
set. Genotypes with quality scores of less than 7 were discarded. Data were 
analysed for 227,876 SNPs in stage 1 and 12,026 (of 13,023 selected) in stage 
2, for which the call rate was >80%. 

The 12,71 1 SNPs for stage 2 were primarily selected on the basis of a 1 d.f. 
Cochran-Armitage trend test ( 11,809, all with P < 0.052). We also included 826 
SNPs with P < 0.01 testing for the difference in frequency of either homozygote 
between cases and controls (that is, assuming either a dominant or recessive 
model) and 76 SNPs that achieved P < 0.01 on a Cochran-Armitage test, weight- ■ 
ing individuals by their family history score as above. 

For the main analyses, we discarded SNPs with a call rate <90% in stage 1 and 
95% in stage 2, and SNPs with a deviation from Hardy-Weinberg equilibrium 
significant at P< 0.00001 in either stage, leaving 205,586 SNPs in stage 1 and 
10,621 SNPs in stage 2. 

The 30 SNPs included in the stage 3 analyses were initially selected on the basis 
of a combined analysis of stage 1 and stage 2. We included all SNPs achieving a 
combined P< 0.00002 (based on either the Cochran-Armitage or 2 d.f. test, see 
below). Following re-evaluation of the stage 2 genotyping by 5' nuclease assay 
(Taqman, Applied Biosystems) using the AB1 PRISM 7900HT (Applied 
Biosystems), and exclusion of some samples, 16 of these SNPs were significant 
at P< 0.00002 and 24 at P< 0.0002 (Supplementary Table 3). One additiopal 
SNP, rs3803662, was added as a result of fine-scale mapping of the TNRC91 
LOC6437M locus. 

The 31 stage 3 SNPs were genotyped in 22 studies (including cases and con- 
trols from SEARCH not used in stage 2, together with 21 other studies). For 18 of 
the studies, genotyping was performed by 5' nuclease assay (Taqman) using the 
ABI PRISM 7900HT or 7500 Sequence Detection Systems according to manu- 
facturer's instructions. Primers and probes were supplied directly by Applied 
Biosystems (http://www.appliedbiosystems.com/) as Assays-by-Design. All 
assays were carried out in 384 r well or 96-well format, with each plate including 
negative controls (with no DNA). Duplicate genotypes were provided for at least 
2% of samples in each study. For three studies, SNPs were genotyped using 
matrix assisted laser desorption/ionization time of flight mass spectrometry 
(MALDl-TOF MS) for the determination of allele-specific primer extension 
products using Sequenom's MassARRAY system and iPLEX technology. The 
design of oligonucleotides was carried out according to the guidelines of 
Sequenom and performed using MassARRAY Assay Design software (version 
1.0). Multiplex PCR amplification of amplicons containing SNPs of interest was 
performed using Qiagen HotStart Taq Polymerase on a Perkin Elmer GeneAmp 
2400 thermal cycler (MI Research) with 5 ng genomic DNA. Primer extension 
reactions were carried out according to manufacturer's instructions for iPLEX 
chemistry. Assay data were analysed using Sequenom TYPER software (version 
3.0). One study used both the Taqman and MALDl-TOF MS approaches. The 
SNPs genotyped in stage 3 were also regenotyped in the stage 2 samples using 
Taqman; these genotype calls were used in the overall analyses (Table 2, 
Supplementary Table 3, and Fig. 2). 

We eliminated any sample that could not be scored on 20% of the SNPs 
attempted. We also removed data for any centre/SNP combination for which 
the call rate was less than 90%. In any instances where the call rate was 90-95%, 
the clustering of genotype calls was re-evaluated by an independent observer to 
determine whether the clustering was sufficiently clear for inclusion. We also 
eliminated all the data for a given SNP/centre where the reproducibility in 
duplicate samples was <97%, or where there was marked deviation from 
Hardy-Weinberg equilibrium in the controls (P < 0.00001 ). 
Fine-scale mapping of FGFR2. Initial tagging of the associated region was done 
by identifying all SNPs with an m.a.f.>5% in the HapMap CEPH/CEU set 
(Utah residents with ancestry from northern and western Europe). We then 
selected 7 SNPs (in addition to rs2981582) that tagged these variants with a 



©2007 Nature Publishing Group 



pairwise r 2 > 0.8, using the program Tagger (http://www.broad.mit.edu/mpg/ 
tagger/) 37 . To identify additional common variants within the 32.5 kb region of 
linkage around the associated SNP, we resequenced 45 lymphocyte DNA samples 
from a subset of European subjects also genotyped by HapMap and other pub- 
licly available data sets. Seventy overlapping PCR amplicons were designed from 
positions 123317613 to 123348192 of chromosome 10 (average amplicon size 
650 bp, 160 bp overlap). M13-tagged PCR products were bidirectionally 
sequenced using Big Dye 3.0 (Applied Biosystems) and processed using auto- 
mated trace analysis through the Cancer Genome Workbench (cgwb.nci.nih.- 
gov). Eighty-six per cent of the nucleotides across the region could be scored for 
polymorphisms in at least 80% of subjects. This set gave a >97% probability of 
detecting a variant with an rn.ai. > 5%. One hundred and seventeen variants 
were identified, including 27 present in dbSNP but without individual genotype 
information in European subjects, and an additional 46 not in dbSNP. 
Individual genotype information was then compared and merged with publicly 
available genotypes from Caucasian subjects (HapMap release 21 for 60 CEU 
parents, 22 European subjects from the Environmental Genome Project (EGP) 
resequencing effort (http://egp.gs.washington.edu/data/fgfr2/), and 24 Euro- 
pean subjects from Perlegen (retrieved through http://gvs.gs.washington.edu/ 
GVS)). There were 2 discrepancies among 389 genotype calls among subjects 
in common between our resequencing effort and EGP or Perlegen data, and 10 
out of 926 compared to HapMap genotypes. 

On the basis of these data, we identified 28 SNPs correlated with rs2981582 
with r 2 > 0.6. We then attempted to genotype these 28 SNPs, plus rs2981582, in a 
subset of 80 controls from SEARCH and 84 controls from the Seoul Breast 
Cancer Study. Twenty-two of the variants were genotyped using Taqman. 
Four further variants (rs34032268, rs2912778, rs2912781 and rs7895676), which 
were not amenable to Taqman, were genotyped by PyTOsequencing (Biotage; 
http://www.biotagebio.com/). Assays were designed using Pyrosequencing 
Assay Design Software 1.0. The remaining 2 SNPs (rs35393331 and 
rs33971856) could not be genotyped using either technology and were excluded 
from further analyses. We cannot therefore comment on their likelihood of being 
the causal variant. Using these data, we selected tagging sets of 1 1 SNPs for UK 
subjects and 14 SNPs for Korean subjects (including rs298 1582), such that each 
of the remaining variants was correlated with a tagging SNP with r 2 > 0.95 in the 
UK study or r 2 > 0.86 in the Korean study. After genolyping the 1 1 tag SNPs in 
SEARCH, two of these SNPs (rs4752569 and rs35012336) showed strong evid- 
ence against being the causative variant and were not considered further. The 
remaining 12 tag SNPs from, the Korean subset were then genotyped in the 
samples from the lARC-Thai Breast Cancer Study, the Breast Cancer Study in 
Taiwan and the Multi-Ethnic Cohort (MEC), by Taqman. 
Statistical methods. The primary test used for each SNP was a Cochran- 
Armitage 1 d.f. score test for association between disease status and allele dose. 
In the combined analysis, we performed a stratified Cochran -Armitage test. 
Stage 1 was given a weight of 4 in this analysis (corresponding to a weight of 2 
in the score statistic), to allow for the expected greater effect size given the 
inclusion of cases with a family history-. In the stage 3 analyses, each study was 
treated as a separate stratum, except for the MEC, in which the European 
American and Japanese American subgroups were treated as separate strata. 
For all studies except the MEC, individuals from a minor ethnic group for that 
study were excluded. Per-allele and genotype-specific odds ratios, and confid- 
ence intervals, were estimated using logistic regression, adjusting for the same 
strata. The summary odds ratios in Fig. 2 are based on the data from the stage 3 
studies only, to avoid the bias inherent in estimates from the stage 1 and 2 data 
for SNPs exhibiting an association (the so called 'winner's curse'). The effects of 
genotype on family history of breast cancer (first degree yes/no) and bilaterality 
were examined by treating these variables as outcomes in a stratified Cochran- 
Armitage test. 

To assess the global significance of the SNPs in stage 3, we computed the sum 
of the x 2 trend statistics (excluding the 6 SNPs reaching genome-wide signifi- 
cance, plus rs2107425 as it was in LD with rs3817198) over those SNPs (17 of 23) 
for which the estimated odds ratios in stage 3 were in the same direction as the 
combined stage 1 /stage 2 38 . Under the null hypothesis of no association, the 
asymptotic distribution of this statistic is /} with n degrees of freedom, where 
n has a binomial distribution with parameters 23 and 1 12. The significance of this 
statistic was then assessed by computing a weighted sum of the tails of the 
relevant y} distributions. 

For the fine-scale mapping of the FGFR2 locus, we first derived haplotype 
frequencies using the haplo.stats package in S-plus 39 , separately for the European 
and Asian populations, using data from the case-control studies on whom the tag 
SNPs were typed plus the 164 control individuals on whom all SNPs were typed. 
These were used to impute genotype probabilities for each identified SNP. in each 
individual. We then used an EM algorithm to fit a logistic regression model 
assuming that each SNP in tum was the causal variant, allowing for uncertainty 



in the genotypes of untyped SNPs, and hence to determine the likelihood that . 
each SN P was the causal variant. 

Coverage of the stage 1 tagging set was estimated using HapMap phase II as a 
reference. We based estimates on 2,116,183 SNPs with an m.a.f. of >5% in the 
CEU population. Of the SNPs successfully genotyped in stage 1, 187,663 were 
also on HapMap. For those SNPs not on HapMap, we identified 'surrogate SNPs 
that were in perfect LD based on genotyping of 24 Caucasians by Perlegen 
Sciences (269,203 SNPs) 18 . To estimate coverage, we determined the best pair- 
wise r 2 for each HapMap SNP and each tag SNP or a surrogate SNP, using the 
HapMap CEU data. This coverage was summarized in terms of the distribution 
of r 2 by allele frequency in 10 categories! 

To estimate the power to detect each of the associations found, we computed 
the non-centrality parameter for the test statistic at each stage, based on the per- 
allele relative risk, allele frequency and r 2 . This was used to estimate the power for 
a given r 2 , based on a simulated trivariate normal distribution for the score 
statistics after each stage to allow for the correlations in the test statistics. We 
assumed a cut-off of P < 0.05 for stage 1, P< 0.00002 for stage 2 and P< 10~ 7 
for stage 3 (the first is slightly conservative, as more SNPs than this were actually 
taken forward). The overall power was obtained by averaging the power esti- 
mates for each r over the distribution of r 2 obtained from the HapMap data, 
applicable to a SNP of that frequency. 

The expected number of significant associations after stage 2 (Table 1) was 
calculated using a bivariate normal distribution for the joint distribution of the 
(weighted) Cochran-Armitage score statistics after stage 1 and after both stages, 
using a correlation of 0.525 between the two statistics (reflecting the weighted 
sizes of the two studies). These calculations were based on the 205,586 SNPs 
reaching the required quality control in stage 1. Of these, 11,313 reached a 
P<0.05, of which 7,405 (65.5%) were successfully genotyped to the required 
quality control in stage 2. Thus the expected number reaching a given signifi- 
cance level with good quality control was calculated from the total number 
expected to reach this level X 65.5%. We adjusted the variances of the test 
statistics, separately for stages 1 and 2, using the genomic control method". 
The adjustment factor, A, was estimated from the median of the smallest 90% 
of the test statistics for SNPs typed in that stage, divided by the predicted median 
for the smallest 90% of a sample of * 2 i distributions (that is, the 45% percentile 
of a x 2 i distribution, 0.375). 

36. Dean, F. B. et at. Comprehensive human genome amplification using 
multiple displacement amplification. Proc. Natl Acad. Set. USA 99, 5261-5266 
(2002). 

37. de Bakker, P. I. W. et ai Efficiency and power in genetic association studies. Nature 
Genet. 37, 1217-1223 (2005). 

38. Tyrer, J., Pharoah, P. D. P. & Easton, D. F. The admixture maximum likelihood test: 
A novel experiment-wise test of association between disease and multiple SNPs. 
Genet. Epidemiol 30, 636-643 (2006). 

39. Schaid, D. L. Rowland, C. M., Tines, D. E., Jacobson, R. M. & Poland, G. A. Score 
tests for association between traits and haplotypes when linkage phase is 
ambiguous. Am. J. Hum: Genet. 70, 425-434 (2002). 



©2007 Nature Publishing Group 



^ * LETTERS 

* nature . 



genetics 



I Two variants on chromosome 17 confer prostate cancer 
| risk, and the one in TCF2 protects against type 2 diabetes 

f 

| Julius Gudmundsson 1 ' 30 , Patrick Sulem 1 ' 30 , Valgerdur Steinthorsdottir 1 , Jon T Bergthorsson 1 , 

8 Gudmar Thorleifsson 1 , Andrei Manolescu 1 , Thorunn Rafnar 1 , Daniel Gudbjartsson 1 , Bjarni A Agnarsson 2 , 

% Adam Baker 1 , Asgeir Sigurdsson 1 , Kristrun R Benediktsdottir 2 , Margret Jakobsdottir 1 , Thorarinn Blondal 1 , 

f Simon N Stacey 1 , Agnar Helgason 1 , Steiriunn Gunnarsdottir 1 , Adalheidur Olafsdottir 1 , Kari T Kristinsson 1 , 

| Birgitta Birgisdottir 1 , Shyamali Ghosh 1 , Steinunn Thorlacius 1 , Dana Magnusdottir 1 , Gerdur Stefansdottir 1 , 

1 Kristleifur Kristjansson 1 , Yu Bagger 3 , Robert L Wilensky 4 , Muredach P Reilly 4 , Andrew D Morris 5 , 

& Charlotte H Kimber 6 , Adebowale Adeyemo 7 , Yuanxiu Chen 7 , Jie Zhou 7 , Wing-Yee So 8 , Peter C Y Tong 8 , 

^ Maggie C Y Ng 8 , Torben Hansen 9 , Gitte Andersen 9 , Knut Borch-Johnsen 9 " 11 , Torben Jorgensen 11 , 

§ Alejandro Tres 12 ' 13 , Fernando Fuertes 14 , Manuel Ruiz-Echarri 12 , Laura Asin 13 , Berta Saez 13 , Erica van Boven 15 , 

0 Siem Klaver 16 , Dorine W Swinkels 16 , Katja K Aben 17 , Theresa Graif 18 , John Cashy 18 , Brian K Suarez 19 , 
£ Onco van Vierssen Trip 20 , Michael L Frigge 1 , Carole Ober 21 , Marten H Hofker 22 ' 23 , Cisca Wijmenga 24 - 25 , 
| Claus Christiansen 3 , Daniel J Rader 4 , Colin N A Palmer 6 , Charles Rotimi 7 , Juliana C N Chan 8 , 

1 Oluf Pedersen 9 ' 10 , Gunnar Sigurdsson 26 ' 27 , Rafh Benediktsson 26 ' 27 , Eirikur Jonsson 28 , 

% Gudmundur V Einarsson 28 , Jose I Mayordomo 12 ' 13 , William J Catalona 18 , Lambertus A Kiemeney 29 , 

a Rosa B Barkardottir 2 , Jeffrey R Gulcher 1 , Unnur Thorsteinsdottir 1 , Augustine Kong 1 8c Kari Stefansson 1 



o We performed a genome-wide association scan to search for 
© sequence variants conferring risk of prostate cancer using 
1,501 Icelandic men with prostate cancer and 11,290 controls. 
Follow-up studies involving three additional case-control 
^roups replicated an association of two variants on 
chromosome 17 with the disease. These two variants, 33 Mb 
apart, fall within a region previously implicated by family- 
based linkage studies on prostate cancer. The risks conferred 



by these variants are moderate individually (allele odds ratio 
of about 1 .20), but because they are common, their joint 
population attributable risk is substantial. One of the variants is 
in TCF2 (HNFIff), a gene known to be mutated in individuals 
with maturity-onset diabetes of the young type 5. Results from 
eight case-control groups, including one West African and one 
Chinese, demonstrate that this variant confers protection 
against type 2 diabetes. 



^eCODE genetics, Sturlugata 8, 101 Reykjavik, Iceland, department of Pathology, Landspitali-University Hospital, 101 Reykjavik, Iceland. 3 Center for Clinical and 
Basic Research A/S, DK-2750 Ballerup, Denmark. University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA. 5 Division of Medicine and 
Therapeutics, Ninewells Hospital and Medical School, Dundee DDI 9SY, Scotland. Population Pharmacogenetics Group, Biomedical Research Centre, Ninewells 
Hospital andMedical School, Dundee DDI 9SY, Scotland. National Human Genome Center, Howard University, Department of Community and Family Medicine, 
Washington DC 20060, USA. department of Medicine and Therapeutics, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, NT, Hong Kong. 
9 Steno Diabetes Center DK-2820 Copenhagen, Denmark. 10 Faculty of Health Science, University of Aarhus, DK-8000 Aarhus, Denmark. n Research Centre for 
Prevention and Health, Glostrup University Hospital, DK-2600 Glostrup, Denmark. 12 Division of Medical Oncology, Lozano Blesa University Hospital, University of 
Zaragoza 50009 Zaragoza, Spain. 13 The Institute of Health Sciences, Nanotechnology Institute of Aragon, 50009 Zaragoza, Spain. 14 Division of Radiation Oncology, 
Lozano Blesa University Hospital, University of Zaragoza, 50009 Zaragoza, Spain. 15 Department of Urology, Maas Ziekenhuis, 5830 AB Boxmeer, The Netherlands. 
16 Department of Clinical Chemistry, Radboud University Nijmegen Medical Center, 6500 HB Nijmegen, The Netherlands. 1 Comprehensive Cancer Center East, 6501 
BG Nijmegen and Department of Epidemiology and Biostatistics, Radboud University Nijmegen Medical Center, 6500 HB Nijmegen, The Netherlands. 18 Department 
of Urology Northwestern University Feinberg School of Medicine, Chicago, Illinois 60611, USA. 19 Department of Psychiatry, Washington University School of 
Medicine, St. Louis, Missouri 63110, USA. 20 Department of Urology, Gelderse Vallei Hospital, 6716 RP Ede, The Netherlands. 21 Department of Human Genetics, 
University of Chicago, Chicago, Illinois 60637, USA. 22 Department of Pathology and Laboratory Medicine, University Medical Center Groningen, 9700 RB Groningen, 
The Netherlands. 23 Department of Molecular Genetics, Maastricht University, 6200 MD Maastricht, The Netherlands. 24 Department of Genetics, University Medical 
Center Groningen, 9700 RB Groningen, The Netherlands. 25 Complex Genetics Section, Department of Biomedical Genetics, University Medical Centre Utrecht, 3508 
AB Utrecht, The Netherlands. 26 Landspitali-University Hospital, 101 Reykjavik, Iceland. "Icelandic Heart Association, 201 Kopavogur, Iceland. 28 Department of 
Urology, Landspitali-University Hospital, 101 Reykjavik, Iceland. 29 Department of Epidemiology and Biostatistics and Department of Urology, Radboud University 
Nijmegen Medical Center, 6500 HB Nijmegen, The Netherlands. 30 These authors contributed equally to this work. Correspondence should be addressed to K.S. 
(kstefans@decode.is) or J.G. (julius.gudmundsson@decode.is). 



Received 21 March; accepted 8 May; published online 1 July 2007; doi:10.1038/ng2062 



NATURE GENETICS VOLUME 39 | NUMBER 8 | AUGUST 2007 



977 



; >Yx;.y;v : .Y' 



LETTERS 



: * 



1 



4.0 T 



CO 

o 

1 

c 

0> 

o 

s> 

to 

! 

8 

6 

I 

c 



a 



a 

s 

a 

an 
c 




Figure 1 A schematic view of the genome-wide association results for chromosome 17q. Shown are 
results from the genome-wide association analysis performed in the Icelandic study population. The 
results plotted are for all lllumina Hap300 chip SNPs that are located between position 30 Mb and the 
telomere (-78.6 Mb; build 35) on the long arm of chromosome 17 (blue diamonds). The six SNP 
markers circled in red and listed in Table 2 all fall within the linkage region described in ref. 8. 



*2 a 

CO 

Z 

8 



Firmly established risk factors for prostate cancer are age, ethnicity 
and family history. Despite a large body of evidence for a genetic 
component to the risk of prostate cancer, sequence variants on 8q24 
are the only common variants reported so far that account for 
substantial proportion of cases 1 " 4 . 

In the present study, we began with a genome-wide SNP association 
study, applying 310,520 SNPs from the IUumina Hap300 chip to 
search for sequence variants conferring risk of prostate cancer using 
Icelandic cases and controls. We expanded the data from a previously 
reported study 2 by increasing the number of cases from 1,453 to 1,501 
and the number of controls from 3,064 to 11,290. This corresponds to 
34% increase in effective sample size. Apart from the variants on 
8q24 (refs. 1,2) and SNPs correlated with them, no other SNPs 
achieved genome- wide significance (Supplementary Fig. 1 online). 
However, we assumed that a properly designed follow-up strategy 
would lead to the identification of additional susceptibility variants for 
prostate cancer. 

Like others 5 , we believe that results from family-based linkage 
udies should be taken into account when evaluating the association 
results of a genome-wide study. However, instead of using linkage 
scores to formally weight the statistical significance of different SNPs 5 , 
we used them to prioritize follow-up studies. The long arm of 
chromosome 17 has been reported in several linkage studies of 
prostate cancer 6- *, but no susceptibility variants have yet been 
found 9-11 . Hence, we decided to focus on this region first. 

We selected for further analysis six SNPs on chromosome 17q 
having the lowest P values (<5 x 10" 4 ) and ranking from 68 to 100 
among the most significantly associated SNPs in our geriome-wide 
analysis (Fig. 1). These SNPs mapped to two 
distinct regions on chromosome 17q that are 
both within a region with LOD scores ranging 
from 1-2 but outside the proposed 10-cM 
candidate gene region reported in a recent 
linkage analysis 8 . One locus was on 17ql2 
(rs7501939 and rs3760511), encompassing 
the 5' end of the TCF2 (HNFlp) gene, 
where the linkage disequilibrium (LD) is 
weak (based on the Utah CEPH (CEU) Hap- 
Map data set). The second locus is in a gene- 
poor area on 17q24.3 (rsl 859962, rs7214479, 
rs6501455 and rs983085) where all four SNPs 



fall within a strong LD block (based on the 
CEU HapMap data set). The two loci are 
separated by approximately 33 Mb, and 
we did not observe any LD between them 
(see Supplementary Table 1 online for r 2 and 
D / values). 

We genotyped five of the six SNPs in three 
prostate cancer case-control groups of Euro- 
pean ancestry (Table 1). The assay for 
rs983085 on 17q24.3 failed in genotyping, 
but this SNP is almost perfectly correlated 
with rs6501455 (r 2 = 0.99) and is therefore 
expected to give comparable results. For each 
of the replication study groups, the observed 
effect of four of the five SNPs were in the 
same direction as in Iceland. One SNP, 
rs6501455, showed an opposite effect in the 
Chicago group. When results from all four 

case-control groups were combined, two 

SNPs achieved genome-wide significance, 
rs7501939 allele C (rs7501939 C) at 17ql2 (odds ratio (OR) = 1.19, 
P = 4.7 x 10- 9 ) and rsl859962 allele G (rsl859962 G) at 17q24.3 
(OR = 1.20, P = 2.5 x 1(T 10 ) (Tables 2 and 3). In an effort to refine 
the signal at the 17ql2 locus, we selected three SNPs (rs4239217, 
rs757210, rs4430796) that were substantially correlated with rs7501939 
(r 2 > 0.5) based on the CEU HapMap data. One of these, rs4430796, 
showed an association to prostate cancer that was stronger than that of 
rs7501939. Specifically, with all groups, combined, allele A of 
rs4430796 had an OR of 1.22 with a P of 1.4 x l(T n (Table 2). 
A joint analysis showed that the effects of rs7501939 and rs3760511 
were no longer significant after adjusting for rs4430796 (P = 0.88 and 
0.58, respectively), whereas rs4430796 remained significant after 
adjusting for both rs7501939 and rs3760511 (P = 0.0042). At 
17q24.3, our attempt at refining the signal did not result in any 
SNP that was more significant than rsl 859962. Among the lllumina 
SNPs, rs71 14479 and rs6501455 were not significant (P > 0.75) with 
adjustment for the effect of rsl 859962, whereas rsl 859962 remained 
significant after adjusting for the other two SNPs (P = 7.4 x \<T*). 
Henceforth, our focus was on rs4430796 at 17ql2 and rsl859962 at 
17q24.3; However, at 17ql2, because rs7501539 was a part of the 
original genome-wide scan, we have included it in the discussion when 
appropriate. For replication efforts, we recommend including at least 
the three abovementioned SNPs. We note that in the results released 
by the Cancer Genetic Markers of Susceptibility study group (see URL 
below), these three SNPs also show nominal, but not genome-wide, 
significant association with prostate cancer. 

For men with prostate cancer diagnosed at age 65 or younger, 
the observed OR from the combined analysis was slightly higher (1.30 



Table 1 Characteristics of men with prostate cancer and controls from four sources 





Affected 






Mean age at 


Age at diagnosis 


Study population 


individuals 


Controls 


Aggressive 3 (%) 


diagnosis (range) 


<65 years (%) 


Iceland 


1,501 


11,290 


50 


70.8 (40-96) 


22 


Nijmegen, The Netherlands 


999 


1,466 


47 


64.2(43^83) 


52 


Zaragoza, Spain 


456 


1,078 


37 


69.3 (44-83) 


19 


Chicago 


537 


514 


48 


59.6 (39^87) 


70 


Total: 


3,493 


14,348 









"'Aggressive' is defined here as cancers with Gleason scores of 7 or higher and/or a stage of T3 or higher and/or node-positive 
disease and/or metastatic c ' 



978 



VOLUME 39 I NUMBER 8 | AUGUST 2007 NATURE GENETICS 



LETTERS 



Table 2 Association results for SNPs on 17ql2 and prostate cancer in Icel and, The Netherlands, Spain and the US 

~ ■ ■ Frequency 



Study population {N cases//V controls) and variant (allele) 



Controls 



OR (95% c.i.) 



P value 



Iceland (1,501/11,289) 

rs7501939 (C) 
rs3760511 (C) 
rs4430796 (A) 


0.615 
0.384 
0.558 


0.578 
0.348 
0.51Z 


1.17 (1.08-1.27) 
1.17 (1.08-1.27) 
1.20(1.11-1.31) 


1.8 x 1(H 
1.6 x lO 4 
1.4 x 10" 5 


co The Netherlands (997/1,464) 

f rs7501939 (C) 


0.648 


0.589 


1.29 (1.15-1.45) 


2,4 x lO" 5 


g rs3760511(C) 
g* rs4430796 (A) 


0.362 
0.568 


0.338 
0.508 


1.11 (0.99-1.25) 
1.28(1.14-1.43) 


0.086 
3.1 x lO" 5 


| Spain (456/1,078) 

c rs7501939 (C) 
§ rs37605 11 (C) 

8 rtiA&myQfi (A) 


0.583 
0.277 
0.469 


0.566 
0.257 
0.454 


1.07 (0.92-1.26) 
1.11 (0.93-1.32) 
1.06(0.91-1.24) 


0.37 
0.25 
0.45 


§ Chicago (536/514) 
| rs7501939(C) 
| rs3760511 (C) 
J rs4430796(A) 


0.637 
0.347 
0.563 


0.588 
0.294 
0.477 


1.15(1.03-1.47) 
1.28(1.06-1.54) 
1.41 (1.19-1.67) 


0.021 
9.4 x 10~ 3 
9.4 x 10" 5 


§ All excluding Iceland (1,989/3,056)" 
g rs7501939(C) 
rs3760511 (C) 
3 rs4430796(A) 




0.581 
0.296 
0.480 . 


1.21 (1.12-1.32) 
1.15 (1.05-1.25) 
1.24(1.14-1.35) 


5.6 x lfT 6 
2.4 x lO" 3 
2.0 x lO" 7 


S All combined (3, 490/1 4,345)" 

^ rs7501939(C) 
£ rs3760511 (C) 
« rs4430796 (A) 




0.580 
0.309 
0.488 


1.19(1.12-1.26) 
1.16(1.09-1.23) 
1.22 (1.15-1.30) 


4.7 x lO" 9 
1.4 x lO* 
1.4 x lO" 11 



n 
a 

2 

3 

1 

© 



All P values shown are two sided. Shown are the numbers of cases and controls (/V), allelic frequencies of variants in affected and control individuals, the allelic odds ratio (OR) with 
95% confidence interval (95% c.i.) and P values based on the multiplicative model. 

•For the combined study populations, the reported control frequency was the average, unweighted control frequency of the individual populations,. whereas the OR and the lvalues were estimated 
using the Mantel-Haenszel model. 



^•wobser 
■g»urth< 



for rs4430796 A and 1.27 for rsl 859962 G). For each copy of the 
at-risk alleles, carriers were diagnosed with prostate cancer 2 months 
younger for rs4430796 and 5 months younger for rsl 859962, 
compared with noncarriers with prostate cancer. However, this 
observation was not statistically significant and therefore requires 
ier investigation. 

We did not observe any interaction between the risk variants on 
17ql2 and 17q24.3; a multiplicative or log-additive model provided an 
adequate fit for the joint risk of rs4430796 and rsl 859962. We 
estimated genotype-specific ORs for each locus individually 
(Table 4). Based on results from all four groups, a multiplicative 
model for the genotype risk provided an adequate fit for rs4430796 at 
17ql2. However, for rsl859962 at the 17q24.3 locus, the full model 
provided a significantly better fit than the multiplicative model 
(P = 0.006), a result driven mainly by the Icelandic samples. 
Specifically, the estimated OR of 1.33 for a heterozygous carrier of 
rsl 859962 G was substantially higher than the 1.20 estimate implied 
by a multiplicative model. 

The SNPs rs7501939 and rs4430796 on 17ql2 are located in the first 
and second intron of the TCF2 gene, respectively. To the best of our 
knowledge, sequence variants in TCF2 have not been previously 
implicated in the risk of prostate cancer. More than 50 different 
exonic TCF2 mutations have been reported in individuals with renal 
cysts, maturity-onset diabetes of the young type 5 (MODY5), pan- 
creatic atrophy and genital tract abnormalities 12,13 . We sequenced all 
nine exons of TCF2 in 200 Icelandic men with prostate cancer and 200 
Icelandic controls without detecting any mutations explaining our 
association signal (data not shown). 



Notably, several epidemiological studies have demonstrated an 
inverse relationship between type 2 diabetes (T2D) and the risk of 
prostate cancer (see ref. 14 and references therein). A recent meta- 
analysis estimated the relative risk of prostate cancer to be 0.84 (95% 
confidence interval (c.i.), 0.71-0.92) among diabetes patients 14 . There- 
fore, we decided to investigate a potential association between T2D 
and the SNPs in TCF2 showing the strongest association with prostate 
cancer in our data. 

We typed the Illumina SNP rs7501939 in 1,380 individuals with 
T2D (males in this group were not known to have prostate cancer, 
according to the Icelandic Cancer Registry list of individuals with 
prostate cancer diagnosed from 1955 to 2006). When compared with 
9,940 controls not known to have either prostate cancer or T2D, 
rs7501939 C showed a protective effect against T2D (OR = 0.88, 
P = 0.0045) in these samples. For the same samples, allele A of the 
refinement SNP rs4430796 gave a comparable result (OR = 0.86, 
P = 0.0021). To validate this association, we typed both rs7501939 and 
rs4430796 in seven additional T2D case-control groups of European, 
African and Asian ancestry (Supplementary Note online). In all seven 
case-control groups, rs7501939 C and rs4430796 A showed a protec- 
tive effect against the disease (that is, an OR < 1.0). Combining results 
from all eight T2D case-control groups, including the Icelandic group, 
gave an OR of 0.91 (P = 9.2 x 1(T 7 ) for rs7501939 C and an OR of 
0.91 (P = 2.7 x 10" 7 ) for rs4430796 A (Table 5). In a joint analysis, 
the effect of rs4430796 remained significant with adjustment for 
rs7501939 (P = 0.016), whereas rs7501939 did not after adjusting 
for rs4430796 (P = 0.41). We note that the former was mainly driven 
by the data from West Africa, where the correlation between the two 



NATURE GENETICS VOLUME 39 | NUMBER 8 | AUGUST 2007 



979 



LETTERS 



Table 3 Association results for SNPs on 17q24.3 and prostate cancer in Iceland, The Netherlands, Spain and the US 



Frequency 



Study population (/Vcases//V controls) and variant (allele) 



Cases 


VAJI III Vlo 


OR (95% c.i.) 


P value 


0.489 


0.453 


1.16(1.07-1.26) 


3.1 x lO" 4 


0.451 


0.415 


1.16(1.07-1.26) 


- 3.3 x 10* 


0.538 


0.501 


1.16(1.07-1.26) 


3.0 x 10* 


0.542 


0.504 


.1.16 (1.07-1.26) .* 


2.0 x 10^ 


0.522 


0.456 


1.30(1.16-1.46) 


6.8 x 10" 6 


0.474 


0.428 


1.20(1.07-1.35) 


1.5 x lO" 3 


0.544 


0.488 


1.25(1.12-1.40) 


1.1 x 10" 4 


0.512 


0.476 


1.15(0.99-1.35) 


0.071 


0.455 


0.426 


1.13(0.96-1.32) 


0.14 


0.581 


0.552 


1.13 (0.97-1.32) 


0.13 


0.513 


0.456 


1.25(1.06-1.49) 


9.8 x 10" 3 


0.460 


0.416 


1.20(1.01-1.42) 


0.041 


0.549 


0.586 


0.86 (0.72-1.02) 


0.083 




0.463 


1.25(1.15-1.35) 


8.3 x lO" 8 . 




0.423 


1.18(1.09-1.28) 


7.0 x lO" 5 




0.542 


1112 (1.05-1,20) 


6.2 x 10" 3 




0.460 


1.20(1.14-1.27) 


2.5 x lO" 10 




0.421 


1.17 (1.10-1.24) 


8.1 x lO" 8 




0.532 


1.14(1.08-1.21) 


6.9 x 1CT 6 



5 
c 

<D 
D) 

2? 
3 

! 

8 

6 



(0 

c 

i 

a. 

Q. 
3 

s 

o 

o> 
c 

!E 
» 

5 

3 
Q_ 

0) 



O 

CM 



Iceland (1,501/11,290) 

rsl859962 (G) 
rs7214479 (T) 
rs6501455 (A) 
rs983085 (C) a 

The Netherlands (999/1,466) 

rsl859962(G) 
rs7214479(T) 
rs6501455 (A) 
Spain (456/1,078) 
rsl859962 (G) 
rs7214479 (T) 
rs6501455(A) 
Chicago (537/510) 
rsl859962 (G) 
rs7214479 (T) 
rs6501455 (A) 

All excluding Iceland ( 1,99 2/3, 054) b 
rsl859962 (G) 
rs7214479 (T) 
rs6501455 (A) 

All combined (3,493/14,344) b 

rsl859962 (G) 
rs7214479 (T) 
rs6501455 (A) 



i affected and control individuals, the allelic odds ratio (OR) with 



All P values shown are two sided. Shown are the numbers of cases and controls (/V), allelic frequencies of variants in i 

95% confidence interval (95% c.i.) and-P values based on the multiplicative model. , 
a SNPs rs983085 and rs6501455 were almost perfectly correlated (r 2 = 0.99), but rs983085 failed in genotyping in the non-Icelandic groups. b For the combined study populations, the reported 
control frequency was the average, unweighted control frequency of the individual populations, whereas the OR and the Pvalues were estimated using the Mantel-Haenszel model. 



SNPs is substantially lower than in individuals of European ancestry 
(r 2 = 0.22 and r 2 = 0.77 in the Yoruba and CEU HapMap samples, 
respectively). For T2D, a recent report 15 describes similar findings 
(OR = 0.89, P = 5 x KT 6 ) for allele G of the SNP rs757210, which is 
substantially correlated with rs4430796 A (D' = 0.96; r 2 = 0.62; based 
on the CEU HapMap data set). This reinforces the finding that one or 



more variants in TCF2 that confer risk of prostate cancer are 
protective against T2D. Notably, removing individuals with T2D 
from the Icelandic case-control group had minimal impact on the 
association of rs4430796 with prostate cancer (Supplementary Note). 

The more distal SNP, rsl859962, on chromosome 17q24.3 is in a 
177.5-kb LD block spanning positions 66.579 Mb to 66.757 Mb 



Table 4 Model-free estimates of the genotype OR of rs4430796 (A) at 17ql2 and rsl859962 (G) at 17q24.3 



Genotype OR a 



Study group and variant (allele) 


Allelic OR 


00 


OX (95% c.i.) 


XX (95% c.i.) . 


P value b 


P value c 


PAR 


Iceland 

rs4430796 (A) 
rsl859962 (G) 


1.20 
1.16 




1.12(0.97-1.29) 
1.35U.1&-1.54) 


1.40(1.19-1.64) 
1.33CL13-1.57) 


0.31 
3.4 x 10" 3 


8.3 x 10" 5 
2.3 x 10- 5 


0.14 
0.19 


All except Iceland 

rs4430796 (A) 
rsl859962 (G) 


1.24 
1.25 




1.34(1.18-1.52) 
1.32(1.17-1.49) 


1.56(1.32-1.84) 
1.57(1.33-1.84) 


0.12 
0.24 


4.5 x irr 7 
2.9 x icr 7 


0.23 
0.22 


All combined 

rs4430796 (A) 
rsl859962 (G) 


1.22 
1.20 




1.24(1.13-1.36) 
1.33(1.21-1.44) 


1.48(1.32-1.66) 
1.45(1.29-1.62) 


0.57 
6.0 x lO" 3 


2.0 x lO" 10 

5.1 x 10- 11 


0.19 
0.21 



PAR, population attributable risk; OR, odds ratio; 95% c.i., 95% confidence interval. 

•Genotype odds ratios for heterozygous (OX) and homozygous carriers (XX) compared with non-carriers (00). ^est of the multiplicative 
of freedom). c Test of no effect (the null hypothesis) versus the full model (two degrees of freedom). 



model (the null hypothesis) versus the full model (one degree 



980 



VOLUME 39 I NUMBER 8 | AUGUST 2007 NATURE GENETICS 



BETTERS 



Table 5 Association results for SNPs in the TC/* gene on 17ql2 and type 2 diabetes 



Frequency 



Study population (/V cases//V controls) and variant (allele) 


Cases 


' Controls 


OR (95%.c.L) 


P value 


Iceland 8 (1,380/9,940) 

rs7501939 (C) 
rs4430796 (A) 


0.549 
0.482 


0.582 
0.521 


0.88 (0.80-0.96) 
0.86 (0.78-0.95) 


0.0045 
0.0021 


Denmark A (264/596) 

rs7501939 (C) 
rs4430796 (A) 


0.525 
0.452 


0.593 
0.530 


0.76 (0.62-0.93) 
0.73 (0.60-0.90) 


0.0088 
0.0032 


Denmark B (1,365/4,843) 

rs7501939(C) 
rs4430796 (A) 


0.579 
0.507 


0.596 
0.528 


0.93 (0*85-1.02) 
0.92 (0.85-1.00) 


0.11 
0.062 


Philadelphia (457/967) 

rs4430796 (A) 


0.569 
0.477 


0.613 
0.527 


0.83 (0.71-0.98) 
0.82 (0.70-0.96) 


0.028 
0.013 


Scotland (3,741/3,718) 

rs7501939(C) 
rs4430796 (A) 


0.607 
0.517 


0.615 
0.526 


0.97 (0.91-1.03) 
0.97 (0.91-1.03) 


0.31 
0.29 


The Netherlands (367/915) 

rs7501939 (C) 
rs4430796 (A) 


0.563 
0.494 


0.579 
0.506 


0.94(0.79-1.11) 
0.95(0.79-1.14) 


0.46 
0.58 


Hong Kong (1,495/993) 

rs7501939 (C) 


0.768 


0.791 


0.87 (0.76-1.00) 


0.054 


rs4430796 (A) 


0.731 


0.754 


0.89 (0.78-1.01) 


0.073 


West Africa b (867/1,115) 

rs7501939 (C) 
rs4430796 (A) 


0.400 
0.271 


0.437 
0.313 


0.87 (0.77-0.99) 
0.80 (0.69-0.92) 


0.0024 


All groups excluding Iceland 

rs7501939 (C) 
rs4430796 (A) 






0.91 (0.87-0.95) 
0.92 (0.88-0.95) 


3.4 x icr 5 
1.8 x lfr 5 


AH groups combined (9,936/23,087) 

rs7501939 (C) 
rs4430796 (A) 






0.91 (0.87-0.94) 
0.91 (0.87-0.94) 


9.2 x 10" 7 
2.7 x lO" 7 


All P values shown are two sided. Shown are the numbers of cas 


;es and controls iN), allelic frequencies of variants in affected and control individuals, the allelic odds n 


atio (OR) with 



S 

£ 

3 

to 



a 

S 
o 

O) 

c 



3 

a. 
a> 



95% confidence interval (95% c.i.) and P values based on the multiplicative model. 

-Men kmZto haTprostete cancer were excluded from the Icelandic T2D group (both affected individuals and controls). "Results for the five West African tnbes have been combined using a 
Mantel-Haensze! method. The frequency of the variant in West African affected individuals and controls is the weighted average over the five tribes. 



(National Center for Biotechnology Information (NCBI) build 35), 
based on the CEU HapMap group. The closest telomeric gene is SOX9, 
located -900 kb away from the LD block. One mRNA (BC039327) 
and several unspliced ESTs have been localized to this region, but it 
does not contain any known genes (University of California Santa 
Cruz Genome Browser, May 2004 assembly). RT-PCR analysis of 
various cDNA libraries, including those derived from the prostate, 
detected expression of the BC039327 mRNA only in a testis library 
(data not shown), in line with previously reported results 16 . 

In summary, we have found that two common variants on 
chromosome 17q, rs4430796 A and rsl 859962 G, contribute to the 
risk of prostate cancer in four populations of European descent. 
Together, based on the combined results, these two variants have an 
estimated joint population attributable risk (PAR) of —36%, which is 
substantial from a public health viewpoint. The large PAR is a 
consequence of the high frequencies of these variants. However, as 
their relative risks, as estimated by the ORs, are not high, the sibling 
risk ratio 17 that they account for is only —1.009 for each variant 
separately and - 1 .0 1 8 jointly. As a consequence, they can explain only 
a small fraction of the familial clustering of the disease and can 
therefore generate only modest linkage scores. We were most intrigued 
that the variant in TCF2 is associated with increased risk of prostate 



cancer but reduced risk of T2D in individuals of European, African 
and Asian descent. The discovery of a sequence variant in the TCF2 
gene that accounts for at least part of the inverse relationship between 
these two diseases provides a step toward understanding the complex 
biochemical checks and balances that result from the pleiotropic 
impact of singular genetic variants. Previous explanations of the 
well-established inverse relationship between prostate cancer and 
T2D have centered on the impact of the metabolic and hormonal 
environment of diabetic men. However, we note that the protective 
effect of both the TCF2 SNPs against T2D is too modest for its impact 
on prostate cancer risk to be merely a by-product of its impact on 
T2D. Indeed, we favor the notion that the primary functional impact 
of rs4430796 (or a presently unknown correlated variant) is on one or 
more metabolic or hormonal pathways important for the normal 
functioning of individuals throughout their lives that incidentally 
modulate the risk of developing prostate cancer and T2D late in life. 

METHODS 

Icelandic study population. Men diagnosed with prostate cancer were identi- 
fied based on a nationwide list from the Icelandic Cancer Registry (ICR) that 
contained all 3,886 Icelandic prostate cancer patients diagnosed from January 1 , 
1955, to December 31, 2005. The Icelandic prostate cancer sample collection 



NATURE GENETICS VOLUME 39 | NUMBER 8 | AUGUST 2007 



981 



LETTERS 



included 1,615 patients (diagnosed from December 1974 to December 2005) 
who were recruited from November 2000 until June 2006 out of the 1,968 
affected individuals who were alive during the study period (a participation 
rate of about 82%). A total of 1,541 affected individuals were included in a 
genome-wide SNP genotyping effort, using the Infinium II assay method and 
the Illumina Sentrix HumanHap300 BeadChip. Of these, 1,501 (97%) were 
successfully genotyped according to our quality control criteria (Supplemen- 
tary Methods online) and were used in the present case-control association 
analysis. The mean age at diagnosis for the consenting patients was 71 years 
(median 71 years; range, 40-96 years), and the mean age at diagnosis was 
(0 73 years for all individuals with prostate cancer in the ICR. The median time 
^ from diagnosis to blood sampling was 2 years (range, 0-26 years) (see ret 1 for 
§ a more detailed description of the Icelandic prostate cancer study population). 
g> No significant difference was seen in frequencies of rs7501939 (C), rs4430796 
a (A) or rsl859962 (G) between men diagnosed before 1998 and those diagnosed 
g in 1998 or later (P = 0.74, P = 0.87 and P = 0.35, respectively). More 
g! specifically, using only cases diagnosed in 1998 or later (N = 880) versus all our 
8 controls (N = 11,289), we obtained OR values of 1.16 (P = 0.004), 1.20 (5.5 x 
2> 10" 4 ) and 1.20 (5 x KT*) for rs7501939 (C), rs4430796 (A) and rsl859962 (G), 
3 respectively. The 11,290 controls (5,010 males and 6,280 females) used in this 
c . study consisted of 758 controls randomly selected from the Icelandic genealo- 
| gical database and 10,532 individuals from . other ongoing genome-wide 
^ association studies at deCODE (specifically, ~ 1,400 from studies on T2D, 
o. ~ 1,600 from studies on breast cancer and 1,800 from studies on myocardial 
2: infarction; studies on colon cancer, anxiety, addiction, schizophrenia and 
q. infectious diseases provided ~ 700-1,000 controls each). The controls had a 
g mean age of 66 years (median, 67 years; range, 22-102 years). The 
£ male controls were absent from the ICR's nationwide list of prostate 
cancer patients. 

.E The study was approved by the Data Protection Commission of Iceland and 
"£ the National Bioethics Committee of Iceland. Written informed consent was 
S obtained from all patients, relatives and controls. Personal identifiers associated 
^ with medical information and blood samples were encrypted with a third-party 
0> encryption system as previously described 18 . 

3 

^ Study populations from The Netherlands, Spain and the US. The total 
rs. number of men with prostate cancer from the Netherlands in this study was 
§ 1 ,01 3, of whom 999 (98%) were successfully genotyped. This study population 
comprised two recruitment sets of men with prostate cancer: Group A, 
® comprising 390 hospital-based affected individuals recruited from January 
1999 to June 2006 at the Urology Outpatient Clinic of the Radboud University 
k Nijmegen Medical Centre (RUNMC), and Group B, consisting of 623 affected 
idividuals recruited from June 2006 to December 2006 through a population- 
based cancer registry held by the Comprehensive Cancer Centre East. Both 
groups were of self-reported European descent. The average age at diagnosis for 
patients in Group A was 63 years (median, 63 years; range, 43-83 years). The 
average age at diagnosis for patients in Group B was 65 years (median 66 years; 
range, 43-75 years). 

The 1,466 control individuals from The Netherlands were cancer free and 
were matched for age with the cases. They were recruited as part of the 
Nijmegen Biomedical Study, a population-based survey conducted by the 
Department of Epidemiology and Biostatistics and the Department of Clinical 
Chemistry of the RUNMC in which 9,371 individuals participated from a total 
of 22,500 age- and sex-stratified randomly selected inhabitants of Nijmegen, 
The Netherlands. Control individuals from the Nijmegen Biomedical Study 
were invited to participate in a study on gene-environment interactions in 
multifactorial diseases such as cancer. All the 1,466 participants in the present 
study are of self-reported European descent and were fully informed about the 
goals and the procedures of the study. The study protocol was approved by the 
Institutional Review Board of Radboud University, and all study subjects gave 
written informed consent. 

The Spanish study population consisted of 464 men with prostate cancer, of 
whom 456 (98%) were successfully genotyped. The cases were recruited from 
the Oncology Department of Zaragoza Hospital in Zaragoza, Spain, from June 
2005 to September 2006. All were of self-reported European descent. Clinical 
information, including age at onset, grade and stage, was obtained from 
medical records. The average age at diagnosis for the patients was 69 years 



J7 



(median, 70 years; range, 44-83 years). The 1,078 Spanish control individuals 
were approached at Zaragoza University Hospital and were confirmed to be 
prostate cancer free before they were included in the study. Study protocols 
were approved by the Institutional Review Board of Zaragoza University 
Hospital. All subjects gave written informed consent 

The Chicago study population consisted of 557 men with prostate cancer, of 
whom 537 (96%) were successfully genotyped. The affected individuals were 
recruited from the Pathology Core of Northwestern University's Prostate 
Cancer Specialized Program of Research Excellence (SPORE) from May 2002 
to September 2006. The average age at diagnosis for the affected individuals was 
60 years (median, 59 years; range, 39-87 years). The .514 European American 
controls were recruited as healthy control subjects for genetic studies at the . 
University of Chicago and Northwestern University Medical School. Study 
protocols were approved by the Institutional Review Boards of North- 
western University and the University of Chicago. AD subjects gave written 
informed consent. 

For description of the diabetes case-control groups, see the Supple- 
mentary Note. 

Association analysis. All Icelandic case and control samples were assayed with 
the Illumina Infinium HumanHap300 SNP chip. This chip contains 317,503 
SNPs and provides about 75% genomic coverage in the Utah CEPH (CEU) 
HapMap samples for common SNPs at r 2 2: 0.8. FOr the association analysis, 
310,520 SNPs were used; 6,983 SNPs were deemed unusable owing to reasons 
such as monomorphism, low yield (<95%) and failure of Hardy- Weinberg 
equilibrium (HWE) (Supplementary Methods). Samples with a call rate 
<98% were excluded from the analysis. Single- SNP genotyping for the five 
SNPs reported here and the four case-control groups was carried out 
by deCODE Generics, applying the Centaurus 19 (Nanogen) platform to 
all populations studied (Supplementary Methods and Supplementary 
Table 2a online). For the five SNPs genotyped by both methods in 
1,501 affected individuals and 758 controls from Iceland, the concordance rate 
for genotypes was >99.5% between the Illumina platform and the 
Centaurus platform. 

For SNPs that were in strong LD, whenever the genotype of one SNP was 
missing for an individual, the genotype of the correlated SNP was used to 
provide partial information through a likelihood approach, as we have done 
before 1 . This ensured that results presented in Tables 2-5 were always based on 
the same number of individuals, allowing meaningful comparisons of results 
for highly correlated SNPs. A likelihood procedure described in a previous 
publication 20 and implemented in NEMO software was used for the association 
analyses. We attempted to genotype all individuals and all SNPs reported in 
Tables 2-5. For each SNP, the yield was >95% in every group. The only 
exception was in the case of refinement marker rs4430796, which was not a part 
of the HumanHap 300 chip. For this SNP, using a single SNP assay to genotype, 
we attempted to genotype 1,883 of the 11,290 Icelandic controls (genotyping 
was successful for 99% of them (1,860 individuals)) as well as all affected 
Icelandic individuals and all individuals from the replication study groups. 
Most notably, for the 17ql2 locus, when we evaluated the significance of one 
SNP (for example, rs4430796, rs7501939 or rs3760511) with adjustment for 
one or two other SNPs, whether we used all 11,289 Icelandic controls that had 
genotypes for at least one of the three markers in Table 2 and handled 
the missing data by applying a likelihood approach as mentioned above or 
whether we applied logistic regression only to individuals that had genotypes 
for all three markers, the resulting P values are very similar. We tested the 
association of an allele with prostate cancer using a standard likelihood ratio 
statistic that, if the subjects were unrelated, would have asymptotically a y} 
distribution with one degree of freedom under the null hypothesis. Allelic 
frequencies rather than carrier frequencies are presented for the markers in the 
main text, but genotype counts are provided in Supplementary Table 3 online. 
Allele-specific ORs and associated P values were calculated assuming a multi- 
plicative model for the two chromosomes of an individual 21 . For each of the 
four case-control groups, there was no significant deviation from HWE in the 
controls (P > 0.01). When estimating genotype-specific OR (Table 3), we 
estimated genotype frequencies in the population assuming HWE. We feel that 
this estimate is more stable than an estimate calculated using the observed 
genotype counts in controls directly. However, we note that these two 



982 



VOLUME 39 I NUMBER 8 I AUGUST 2007 NATURE GENETICS 



LETTERS 



0 



approaches gave very similar estimates" in this instance. Results from multiple 
case-control groups were combined using a Mantel-Haenszel model 22 in which 
the groups were allowed to have different population frequencies for alleles, 
haplotypes and genotypes but were assumed to have common relative risks. All 
four of the European sample groups include both male and female controls. We 
did not detect a significant difference between male and female controls for 
SNPs in Tables 2-4 for each of the groups after correction for the number of 
tests performed. We note that for all the three significant variants (rs7501939, 
rs4430796 and rsl859962) reported in Tables 2 and 3, we did not detect any 
significant differences in frequencies among the different groups of affected 
individuals (see description of Icelandic control samples) that make up the 
Icelandic genome-wide control sets (P = 0.30, 0.55 and 0.88, respectively). The 
individuals with T2D were removed when this test was performed for 
1 rs7501939 and rs4430796. Our analysis of the data does not indicate any 
differential association by gender of rs7501939 or rs4430796 to T2D. We used 
linear regression to estimate the relationship between age at onset for prostate 
cancer and number of copies of at-risk alleles (for rs7501939 and rsl859962) 
carried by affected individuals, using group as an indicator. 

To investigate potential interaction between rs7501939 C and rsl 859962 G 
located at 17ql2 and 17q24.3, respectively, we performed two analyses. First, we 
checked for the absence of significant correlation between those alleles among 
cases. Second, using logistic regression, we demonstrated that the inter- 
action term was not significant (P = 0.57). The joint PAR was calculated as 
1 - ((1 - PAR,) x (1 - PAR 2 )), where PARi and PAR 2 are the individual PARs 
for each SNP calculated under the full model and assuming no interaction 
between the SNPs. 

: We note that for the SNP rs757210, others have reported the results for allele 
j A 15 . However, in the main text, we provide their corresponding results for the 
!> other allele (allele G of rs757210) because that allele was the one positively 
i correlated with our reported allele C of rs7501939. 

i Correction for relatedness and genomic control. Some individuals in the 
1 Icelandic case-control groups were related to each other, causing the afbre- 
i mentioned x 2 test statistic to have a mean >1. We estimated the inflation 
i factor by calculating the mean of the 310,520 y} statistics, which is 1.098. Using 
I a method of genomic control 23 to adjust for both relatedness and potential 
\ population stratification, results presented here are based on adjusting the 
j x 2 statistics by dividing each of them by 1.098. Supplementary Figure 1 is a 
1 Q-Q plot of the observed x 2 statistics, before and after adjustment, against the 
\ x 2 distribution with one degree of freedom. 

k URLs. Cancer Genetic Markers of Susceptibility Project: http://cgems.cancer. 
jov/. University of California Santa Cruz Genome Browser: http://www. 
genome.ucsc.edu. 

Requests for materials: kstefans@decode.is or julius.gudmundsson@decode.is 
ACKNOWLEDGMENTS 

We thank the research subjects whose contribution made this work possible. 
We also thank the nurses at Noatun (deCODE's sample recruitment center) 
and personnel at the deCODE core facUities. This project was funded in part 
by contract number 018827 (Polygene) from the 6th Framework Program of the 
European Union and by US Department of Defense Congressionally Directed 
Medical Research Program W81XWH-05- 1-0074. Support for the Africa America 
Diabetes Mellitus (AADM) study is provided by multiple institutes of the US 
National Institutes of Health: the National Center on Minority Health and Health 
Disparities (3T37TW00041-03S2), National Institute of Diabetes and Digestive 
and Kidney (DK072128), the National Human Genome Research Institute and 
the National Center for Research Resources (RR03048). The Scottish Diabetes 
case control study was funded by the Wellcome Trust. C.NJV.R and A.M. are 
supported by the Scottish Executive Generation Scotland Initiative. The Hong 



Kong Diabetes case control study was supported by the Hong Kong Research 
Grants Committee Central Allocation Scheme CUHK 1/04C. 

AUTHOR CONTRIBUTIONS 

The principal investigators of the prostate cancer replication study samples are 
J.I.M., LAX and W.J.C. 

COMPETING INTERESTS STATEMENT 

The authors declare competing financial interests: details accompany the full-text 
HTML version of the paper at http://www.nature.com/naturegenetics/. 

Published online at http-7/www. nature.com/naturegenetics 

Reprints and permissions information is available online at http://npg.nature.com/ 
reprintsand permissions , 



1. 



4. 



6. 



7. 



8, 



9. 



Amundadottir, L.T. et al. A common variant associated with prostate cancer in 
European and African populations. Nat. Genet 38, 652-658 (2006). 
Gudmundsson, J. Genome-wide association study identifies a second prostate cancer 
susceptibility variant at 8q24. Nat Genet 39, 631-637 (2007). 
Haiman, C.A. etal. Multiple regions within 8q24 independently affect risk for prostate 
cancer. Nat Genet 39, 63S-644 (2007). 

Yeager, M. et at. Genome-wide association study of prostate cancer identifies a second 
risk locus at 8q24. Nat Genet 39 t 645-649 (2007). 
Roeder, K., Bacanu, S.A., Wasserman, L. & Devlin, B. Using linkage genome scans to 
improve power of association in genome scans. Am. J. Hum: Genet 78, 243-252 
(2006). 

Lange, E.M. et al. Genome-wide scan for prostate cancer susceptibility genes using 
families from the University of Michigan prostate cancer genetics project finds 
evidence for linkage on chromosome 17 near BRCA1. Prostate 57, 326-334 (2003). 
Xu, J. et al. A combined genomewide linkage scan of 1,233 families for prostate 
cancer-susceptibility genes conducted by the international consortium for prostate 
cancer genetics. Am. J. Hum. Genet 77, 219-229 (2005). 
Lange, E.M. et al. Fine-mapping the putative chromosome 17q21-22 prostate cancer 
susceptibility gene to a 10 cM region based on linkage analysis. Hum. Genet 121, 
49-55 (2007). 

Zuhlke, K.A. et al. Truncating BRCA1 mutations are uncommon in a cohort of 
hereditary prostate cancer families with evidence of linkage to 17q markers. Clin. 
Cancer Res. 10, 5975-5980 (2004). 

10. Kraft, P. ef al. Genetic variation in the HSD17B1 gene and risk of prostate cancer. 
PLoS Genet l,e68(2005). 

11. White, KA, Lange, E.M., Ray, A.M., Wojno, KJ. & Cooney, K.A. Prohibitin mutations 
are uncommon in prostate cancer families linked to chromosome 1 7q. Prostate Cancer 
Prostatic Dis. 9, 298-302 (2006). 

12. Bellanne-Chantelot, C. ef al. Large genomic rearrangements in the hepatocyte nuclear 
factor- 1 beta (TCF2) gene are the most frequent cause of maturity-onset diabetes of the 
young type 5. Diabetes 54. 3126-3132 (2005). 

13. Edghill, E.L., Bingham, C, El lard, S. & Hattersley, A.T. Mutations in hepatocyte 
nuclear factor-lbeta and their related phenotypes. J. Med. Genet 43, 84-90 
(2006). 

14. Kasper, J.S. & Giovannucci, E. A meta-analysis of diabetes mellitus and the risk of 
prostate cancer. Cancer Epidemiol. Biomarkers Prev. 15, 2056-2062 (2006). 

15. Winckler, W. ef al. Evaluation of common variants in the six known maturity-onset 
diabetes of the young (MODY) genes for association with type 2 diabetes. Diabetes 56, 
685-693(2007). . ~ 

16. HillHarfe, K.L. etal. Fine mapping of chromosome 1 7 translocation breakpoints > or 
= 900 Kb upstream of S0X9 in acampomelic campomelic dysplasia and a mild, 
familial skeletal dysplasia. Am. J. Hum. Genet. 76, 663-671 (2005). 

17. Risch, N. Linkage strategies for genetically complex traits. I. Multilocus models. Am. J. 
Hum. Genet. 46/ 222-228 (1990). 

18. Gulcher, J.R., Kristjansson, K. f Gudbjartsson, H. & Stefansson, K. Protection of pnvacy 
by third-party encryption in genetic research in Iceland. Eur. J. Hum. Genet. 8, 
739-742(2000). 

19. Kutyavin, I.V. ef al. A novel endonuclease IV post-PCR genotyping system. Nucleic 
Acids Res. 34, el28 (2006). 

20. Gretarsdottir, S. et al. The gene encoding phosphodiesterase 4D confers risk of 
ischemic stroke. Nat Genet. 35, 131-138 (2003). 

21. Falk, C.T. & Rubinstein, P. Haplotype relative risks: an easy reliable way to construct a 
proper control sample for risk calculations. Ann. Hum. Genet. 51, 227-233 (1987). 

22. Mantel, N. & Haenszel, W. Statistical aspects of the analysis of data from retrospective 
studies of disease. J. Natl. Cancer Inst. 22, 719-748 (1959). 

23. Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 
997-1004 (1999). 



NATURE GENETICS VOLUME 39 I NUMBER 8 | AUGUST 2007 



983 



Ann. Hum. Genet. (1W7), 51, 227-233 
Printed in Great Britain 



227 



Haplotype relative risks: an easy reliable way to construct a proper 
control sample for risk calculations 

C. T. FALK and P. RUBINSTEIN 

The Lindsley F. Kimball Research Institute of The New York Blood Center, 310 E. 67th St., 

New York, NY 10021 

SUMMARY 

An alternative to Woolf 's (1955) relative risk (RR) statistic is proposed for use in calculating 
the risk of disease in the presence of particular antigens or phenotypes. This alternative uses, 
as the control sample, the parental antigens or haplotypes not present in the affected child. The 
formulation of a haplotype relative risk (HRR) thus eliminates the problems of sampling from 
the same homogeneous population to form both the disease sample and an appropriate control. 

We show that, in families selected through a single affected individual, where transmission 
of the four parental haplotypes can be followed unambiguously, the mathematical expectation 
of the HRR is identical to that of the RR. Since the sample formed from the 'non-affected' 
parental haplotypes is clearly from the same population as the disease sample, the HRR thus 
provides a reliable alternative to the RR. A further advantage obtains when family data arc 
being collected as part of a study since the control sample is then automatically contained in 
the family material. 

Data from studies of patients with insulin dependent diabetes mellitus (IDDM) are used to 
obtain an estimate of the risk to those with HLA antigens or phenotypes associated with IDDM 
using the HRR statistic. A comparison of the HRR's and RR's for these data is also presented. 

INTRODUCTION 

Relative risks have been used for some time to estimate the increased risk of contracting a 
disease, given that a certain condition (or trait) is present, over that of the group lacking the 
condition. This formal definition of a relative risk requires prospective information that is not 
easily obtained and the relative risk is often approximated by the more easily obtained cross 

P roduct Pr(e|aff)Pr( g [control) 

Pr(g|aff)Pr(Q| control)' 

where Q stands for the presence of the condition or trait and q for the lack of the condition, 
and the four terms are conditional probabilities as indicated. When the overall frequency of 
the disease in a population is low, this estimate will closely approximate the true relative risk. 
This odds ratio was proposed by Woolf (1955) to estimate the risk of contracting either peptic 
ulcers or stomach cancer for individuals of particular ABO phenotypes. Since then it has been 
used to calculate risks for genetic markers associated with many diseases and its most notable 
use has been in studying several HLA-associated diseases such as insulin dependent diabetes 
mellitus (IDDM), coeliac disease, multiple sclerosis and ankylosing spondylitis. Several assump- 
tions are generally made about the underlying population from which both the disease sample 



228 C. T. Falk and P. Rubinstein 

and the control sample are obtained, most importantly that both samples are drawn from the 
same genetically homogeneous population in an unbiased way. By this we mean that the disease 
sample should be selected on a clear-cut ascertainment criterion, e.g. randomly chosen affcetod 
individuals with no bias pertaining to other factors, and the control sample should be a strictly 
random sample from the same genetic population. In practice, this latter criterion is rather 
difficult to fulfil and most often the control is created from conveniently available data drawn 
from a population thought to be somewhat closely related to that from which the disease sample 
was drawn. 

Several years ago we proposed (Rubinstein el at. 1981) an alternative method for obtaining 
the control sample for relative risk (RR) estimations that eliminated the problems of sampling 
from a single homogeneous population. This method used, as a control, those parental 
haplotypes not present in the affected child and was therefore called the haplotype relative risk 
(HRR). This method has several appealing features including freedom from collection of proper 
control samples. Additionally, where families are to be studied anyway, collection of the family 
data automatically includes collection of the necessary control sample. It is, however, necessary 
to demonstrate that the HRR estimate has the appropriate characteristics. In this paper we 
will show that . assuming the ' ideal ' conditions inherent in the definition of RR, namely, control 
and disease samples both randomly chosen from the same homogeneous random mating 
population, the expected value of the HRR is identical to that of the conventional RR. We 
will then illustrate its use in the estimation of risks for HLA antigens and phenotypes associated 
with 1 1) DM. 

THE MODEL 

Consider a set of families that has been ascertained through a single affected child, where the 
relevant disease locus is closely linked to a normal polymorphic genetic marker (e.g. HLA) and 
where certain alleles (antigens) are associated with the disease. For purposes of concreteness, 
we will assume that the disease is recessively inherited, although the same arguments hold for 
dominance and for other inheritance models as well. Assume that the HLA haplotypes present 
in the parenls can be followed unambiguously in transmission to the offspring and designate 
the two inherited by the affected child as V (paternal) and V (maternal). Thus haplotypes 
a and c are assumed to carry the disease allele, say 'n'. In the special case where the child 
as well as both parents are ac. it is not certain whether the child gets the o from the mother 
or the father. However, it is still known that one a and one c haplotype were transmitted to 
the affected child, and thus carry the n allele, and that the haplotypes not passed on to the 
affected child were also a and c. The latter can therefore be included in the 'random sample' 
as described below. Now if we have truly obtained our sample as a random, singly selected 
sample, the two parental haplotypes not transmitted to the affected child (say 6 and d) will 
represent a random sample of haplotypes from the population at large and will thus carry the 
disease allele («) or the normal allele (N) with probabilities equal to the allele frequencies in 
the population (say Pl and p 2 , respectively, + p 2 = 1 )• The validity of this observation requires 
compliance with certain other assumptions including (1) that the parents are not inbred, (2) 
that there is no correlation within or between parental phenotypes and (3) that there is no 
differential fertility of the disease phenotypes. 

Now assume that an antigen Q' at the HLA locus is in positive linkage disequilibrium with 
n. the disease allele. We wish to calculate the relative risk to carriers of Q of contracting the 



Haplotype relative risks 229 

disease. We will use as our control population the set of '6' and 'd' haplotypes from our sample 
of disease families (that is, those haplotypes within a family not carried by the single affected 
proband) Using this control we will then calculate the conventional cross product odds ratio 
given above to obtain the haplotype relative risk (HRR). Define the relevant population 
frequencies as follows : , 

f(Q) = q v 

f(q) = q 2 = i—qi (where q represents all other alleles), 

/(») = Pv 

f(N)=p s = l- Pl , 
AQn) =x t = p 1 q 1 +9, 
f(QN) = x i = p 2 q l -S, 

f(qn)=x 3 =p 1 q i -8, 
/(gJV) = x 4 =p 2 g 2 +*, 

where d is the measure of disequilibrium between n and Q. 

We now need the four conditional probabilities necessary for the odds ratio. For the affected 
sample these are the same, regardless of how we choose our control. 

p\-A 
~ Pl ' 

Pr(not(2!aff ) = ^ ? = | 

Now since the control haplotypes will be a random sample from the population, the conditional 
probabilities will be: = l-<* 3 +* 4 )' = 

Pr(not Q\ control) = (x 3 + * 4 ) 2 = ql 
Thus the estimate of the HRR is : 

Pr(Q|aff) Pr(not Q|control) 
HRR - pr(not Q | ftff) Fr{Q | con trol) 

_ kP\-A)q\ 

x|(l -ql)' 

which is identical to the equivalent expression for the conventional RR. 

EXAMPLE 

Using data collected for the 9th HLA Workshop (Bertrams & Baur, 1984) we looked at the 
sample of families, submitted for study, where a single child was affected with IDDM and where 
the ethnic background was caucasoid (Western European or North American). The patients 



230 



C. T. Falk and P. Rubinstein 



Table 1 . DR phenotypes of IDDM disease sample, simplex cases 



DR type 


No. obs. 


No. exp. 


DR3, 3 


6 




DR3. 4 




1 8*2 


DR4, 4 


4 


107 


DR3, X 


16 


191 


DR4, X 


29 


224 


DRX. X 


10 


117 


Tot a J 


go 


899 



p(DR3) = 0-294: p(DR4) = 0344 : p(DRX) = 0-361 ; a//? = (0-278)/(O202) = 138. 
Table 2. DR phenotypes of control sample consisting of non-affected parental haplotypes 



DR type 


No. obs. 


No. exp 


DR3. 3 


0 


077 


DR3, 4 


2 


1 23 


DR4. 4 


0 


049 


DR3, X 


13 


1226 


DR4, X 


10 


97 6 


DRX, X 


48 


48*49 


Total 


. 73 


7300 



/>(/>/«) = 0-103: p(l)R4) = 0082: P(DBX) = 0-815; x* = »'*9, 2 - d f - 

were categorized with respect to their HLA DR phenotypes using three distinct allelic groups 
DR3. DR4. and DRX. where DRX represents all other DR antigens except DR3 and DR4. The 
results are shown in Table 1 with estimated allele frequencies and observed and 'Hardy- 
Weinberg expected * numbers for each phenotypic class. The a/fi ratio of Falk et cU. (1983) was 
also calculated and found to be 1-38. This ratio relates the observed frequency (a) of, say the 
DR3.4 phenotype, to the Hardy-Weinberg expected frequency [fi = 2p(DR3)p(DR4)} in a 
sample of diseased individuals (Table 1). A value in excess of 1-0 is an indication that the 
associated suscept ibility locus does not show a simple dominant or recessive mode of inheritance 
with a single susceptibility allele. The value of 138 found here is characteristic of samples of 
1 DDM individuals where an excess of DR3, 4*s is often observed thus suggesting a more complex 
mode of inheritance for susceptibility (Falk, 1984). The 'control group' was made up of the 
parental haplotype pairs not present in the affected child (only families in which all four HLA 
haplotypes could be followed were used). There were 146 parental control haplotypes. The allele 
frequencies for DR3. DR4, and DRX in this group were 0103, 0082, and 0815 respectively. 
These values agree remarkably well with the total frequencies obtained for the 4 random mating 
population ' comprising all caucasoid random individuals submitted to the 9th HLA Workshop 
(Baur el al 1984) (see. e.g. the table on page 694, where the DR marginal frequencies are 0-122, 
0 129. and 0-749 for the same three DR alleles). If the control haplotypes from each family 
are assumed to be a control individual', we obtain a control population sample of 73 which 
is in H-W equilibrium ( f = 1-79. 2 d.f., see Table 2). 

In Table 3, we compare the HRR's for DR3 and DR4 to the RR's calculated using a 
'contrived control population ' from the 9th HLA Workshop population data referred to above. 
This population' is assumed to be in H-W equilibrium and our 'random sample' is of the same 



Haplotype relative risks 



231 



Table 3. HRRs and RRsfor the DR3 and DR4 antigens in a sample of simplex WDM patients 

(The control for the HRR's is the sample of parental haplotypes not present in the affected individuals. 
The control for the RR's was obtained by 'creating' a H-W sample assuming the antigen frequencies 
- recorded for the 9th HLA workshop (Baur et al. 1984).) 



control 



Disease 
control 



HRR 

DR 3 
+ - 

47 43 90 Disease 
15 58 73 control 
62 101 163 
HRR = 423 

p = 2 6 X IO" 5 

DR4 

+ - 

58 32 90 Disease 
12 61 73 control 
70 93 163 
HRR = 921 

p = 76 x 10" 10 



RR 

DR3 
+ - 
47 43 90 

21 69 90 
68 112 180 

RR = 359 
p =5.3x10-* 

DR4 
+ - 
58 32 90 

22 68 90 
80 100 180 

RR = 560 

p = 6-8 x 10" 



Table 4. HRRs and RRs for the DR3, 3, DR3, 4 and DR4, 4 phenotypes 

(Samples are the same as those described in Table 3. In each case comparison is made relative to the 
'base group' DRX, X to avoid the problems of non-independent risk estimates.). 





Disease 


Parental 


Workshop 


DR type 


sample 


control 


control 


DR3, 3 


6 


0 


i-3 


DR3, 4 


25 


2 


2-8 


DR4, 4 


4 


0 


i"5 


DR3, X 


16 


13 


164 


DR4,X 


29 


10 


IT4 


DRX, X 


10 


48 


50*5 


Total 


90 


73 


899 


HRR 




RR 




HRR(3, 4) = 


6o*o 


RR(3, 4) 


= 45-1 


HRR(3, 3) - 


undefined 


RR(3, 3) 


= 233 


HRR(4, 4) = 


undefined 


RR(4, 4) 


= 13*5 



If 'expected values' are substituted for the zero observations in the parental control, one gets: 

HRR'(3, 3) = 37 4. 
HRR'(4, 4) - 39*2. 



size as our disease sample (i.e. 90 individuals). Table 4 gives HRR's and RR's for the three DR 
phenotypes DR3, 3, DR3, 4, and DR4, 4 using the same samples. Here the risks are compared 
to the baseline phenotype DRX, X in each case since the risks are not independent (cf. 
Curie-Cohen, 1981, Svejgaard & Ryder, 1981). Note that the HRR's for DR3, 3 and DR4, 4 
are undefined since there are no ' individuals ' with those phenotypes in the control sample of 
73. If expected values are substituted for the 1 zero ' values in those cases HRR's can be estimated 
as given at the bottom of Table 4, but the use of such estimates must be made with caution. 



it) 



HOB 51 



232 



C. T. Falk and R Rubinstein 



DISCUSSION 

One of the major problems inherent in proper calculations of relative risks (RR's) is that of 
choosing an appropriate control. A basic assumption in the use of RR's is that both the affected 
sample and the control sample are chosen at random from the same genetically homogeneous 
random mating population with no selection criteria except for the disease status required for 
inclusion in the affected sample. In practice this is a difficult criterion to fulfil. Additionally, 
it adds a significant amount of work to select and test such a control sample. It is therefore- 
often assumed that the control sample is simply a hypothetical sample created from a population 
thought to be similar to that of the disease sample and 'generated' from that population by 
assuming H-W equilibrium and some reasonable sample size (cf. Svejgaard & Ryder, 1981, and 
our * contrived* sample of the previous section). 

Given the known heterogeneity of current urban populations, even within the less hetero- 
geneous European countries, use of population control data culled, for example, from HLA 
workshop surveys, may alter the significance of calculated RR's. Although, in the examples 
given here the results are significant for both RR's and HRR's (Table 3), the > values' for 
significance differ by two-fold (for DK3) and 100-fold (for DR4), with the HRR's being more 
significant in each case. If less extreme samples were tested, careless choice of the control group 
could very well make the difference between statistical significance and non-significance 
(resulting in either a type I or a type 11 error). 

Methods have previously been proposed for using sibship information to calculate 'risks'. For 
example. Clarke (1961 ) describes a method, attributed to C. A. B. Smith, for using sibships to 
test for a significant risk of duodenal ulcers to individuals of blood group O. The method used 
is somewhat different from that described here in that an observed and expected probability 
of being group O is assigned to the propositus in each sibship where the expected value depends 
on the makeup of the sibship. The significance is then based on a comparison of pooled observed 
and expected values over a set of sibships. This method does overcome the problem of 
heterogeneity but. because of the way the test is constructed, only a small part of the data can 
be used. In Clarke's example, therefore, the associations found when using the general 
population as a control were very much decreased when using Smith's sibship method. This does 
not seem to be the case using HRR's where the associations remain strong. 

By using the two parental haplotypes not present in the single diseased individuals of the 
disease sample as the control sample', we are assured of having both samples from the same 
genetic population and. as was demonstrated above, this sample should represent a random 
sample of haplotype pairs (or individuals') from that population. Care must still be taken to 
ensure that the population chosen is genetically homogeneous, to the extent possible, but the 
task of obtaining an appropriate control is simplified. 

If the disease is dominant rather than recessive, the HRR can still be used in the same way. 
Although it is not known whether the disease allele is present on the paternal haplotype ('a 9 ) 
or the maternal (V) or perhaps on both, the other two parental haplotypes, b and d, will still 
represent random haplotypes from the underlying population, provided that the conditions 
mentioned for the recessive case obtain. 

If m family is selected through more than one affected child, the situation is somewhat 
different . If the two affected sibs share the same two HLA haplotypes then the other two should 



Haplotype relative risks 233 

still represent random haplotypes from the population. However, if they share fewer than two 
haplotypes, the situation is more complicated. Now three (or possibly four) haplotypes are 
known to carry the disease allele in the recessive case. If the disease is dominant, it is possible, 
but not certain, that a single shared haplotype carries the disease allele. If no haplotype is 
shared, it is not possible to define disease-carrying haplotypes with certainty. In such cases it 
would therefore be difficult to define a control sample of random haplotypes meeting the 
necessary criteria. 

Two other points should be emphasized. If there is differential selection between genotypes 
at the susceptibility locus, (e.g. reduced fertility) a bias might be introduced such that the 
control haplotypes could no longer be considered a random population sample. Thus we require 
compliance with assumption (3) of our model to ensure the proper distribution of susceptibility 
alleles in the 'control' haplotypes. 

Further, if the population from which the sample is drawn is genetically heterogeneous with 
respect to the disease, the HRR as well as the RR may be difficult to interpret as well as to 
use. In an extreme case a population might be made up of two ethnically distinct subpopulations 
that do not interbreed. Assume that the disease of interest occurs in only one of two such 
subpopulations. An estimate of the HRR would come entirely from a sample taken from the 
subpopulation where the disease is present and. would be relevant only to that population 
(individuals in the other group having no risk, by definition). On the other hand, the RR would 
assign a risk over the entire population that would be too low for individuals in the susceptible 
part of the population and too high for individuals in the non-susceptible part. 

We wish to thank Drs Jurg Ott, Neil Risch and C. A. B. Smith for helpful and constructive comments on 
an earlier draft of this paper. 
This work was supported by NIH grant GM291 77. 



REFERENCES 

Baur, M. P., Neugebauer, M. & Albert, E. D. (1984). Reference tables of two-locus haplotype frequencies 

for all MHC marker loci. In Histocompatibility Testing (eds. E. D. Albert, M. P. Baur and W. R, Mayr), pp. 

677-755. Berlin : Springer- Verlag. 
Bertrams, J. & Baur, M. P. (1984). Insulin-dependent diabetes mellitus. In Histocompatibility Testing (eds. 

E. D. Albert, M. P. Baur and W. R. Mayr), pp. 348-358. Berlin: Springer- Verlag. 
Clarke, C. A. (1961). Blood Groups and Disease. Progress in Medical Genetics 1, 81-119. 
Curie-Cohen, M. (1981). HLA antigens and susceptibility to juvenile diabetes: do additive relative 

risks imply genetic heterogeneity? Tissue Antigens 17, 136-148. 
Falk, C. T., Mendell, N. R. & Rubinstein, P. (1983). Effect of population associations and reduced 

penetrance on observed and expected genotype frequencies in a simple genetic model : application to HLA 

and insulin dependent diabetes mellitus. Ann. Hum. Genet. 47, 161-165. 
Falk, C. T. (1984). A two-susceptibility -allele mode) for genetic diseases and associated marker loci : differences 

and similarities to a one-s-allele model. Ann. Hum. Genet. 48, 87-95. 
Rubinstein, P., Walker, M., Carpenter, C„ Carrier, C, Krassner, J., Falk, C. & Ginsberg, F. (1981). 

Genetics of HLA disease associations. The use of the haplotype relative risk (HRR) and the "haplo-delta" 

(Dh) estimates in juvenile diabetes from three racial groups. Human Immunology 3, 384 (Abstract). 
Svejgaard, A. & Ryder, L. P. (1981). HLA genotype distribution and genetic models of insulin -dependent 

diabetes me/litus. Ann. Hum. Genet. 45, 293-298. 
Woolk, B. (1955). On estimating the relation between blood group and disease. Ann. Hum. Genet. 19, 251-253. 



16-2 



Statistical Aspects of the Analysis of 
Data From Retrospective Studies of 
Disease 1 



Nathan Mantel and William Haenszel, Biome- 
try Branch, National Cancer Institute,* Bethesda, 
Maryland 



Summary 

The role and limitations of retrospective investigations of factors possibly 
associated with the occurrence of a disease are discussed and their 
relationship to forward-type studies emphasized. Examples of situations 
in which misleading associations could arise through the use of inappropri- 
ate control groups are presented. The possibility of misleading associa- 
tions may be minimized by controlling or matching on factors which 
could produce such associations; the statistical analysis will then be 
modified. Statistical methodology is presented for analyzing retro- 
spective study data, including chi-square measures of statistical signifi- 
cance of the observed association between the disease and the factor 
under study, and measures for interpreting the association in terms of an 
increased relative risk of disease. An extension of the chi-square test 
to the situation where data are subclassiFied by factors controlled in the 
analysis is given. A summary relative risk formula, R, is presented and 
disclissed in connection with the problem of weighting the individual sub- 
category relative risks according to their importance or their precision. 
Alternative relative-risk formulas, Ru Ra, Rz, and R4, which require the 
calculation of subcategory-adjustea proportions of the study factor 
among diseased persons and controls for the computation of relative 
risks, are discussed. While these latter formulas may be useful in many 
instances, they may be biased or inconsistent and are not, in fact, aver- 
ages of the relative risks observed in the separate subcategories. Only 
the relative-risk formula, H, of those presented, can be viewed as such an 
average. The relationship of the matched-sample method to the sub- 
classification approach is indicated. The statistical methodology pre- 
sented is illustrated with examples from a study of women with epidermoid 
and undifferentiated pulmonary carcinoma.— J. Nat. Cancer Inst. 22t 719- 
748, 1959. 



Introduction 

A retrospective study of disease occurrence may be defined as one in 
which the determination of association of a disease with some factor is 
based on an unusually high or low frequency of that factor among diseased 
persons. This contrasts with a forward study in which one looks instead 

» Received for publication November 6 t 1968. 

t National Institutes of Health, Public Health Service, U.S. Department of Health, Education, and Welfare. 

719 



720 



MANTEL AND HAENSZEL 



for an unusually high or low occurrence of the disease among individuals 
possessing the factor in question. Each approach has its advantages. 
Among the desirable attributes of the retrospective study is the ability to 
yield results from presently collectible data, whereas the forward study 
usually requires future observation of individuals over an extended period 
(this is not always true; if the status of individuals can be determined 
as of some past date, the data for a forward study may already be at 
hand) . The retrospective approach is also adapted to the limited resources 
of an individual investigator and places a premium on the formulation of 
hypotheses for testing, rather than on facilities for data collection. For 
especially rare, diseases a retrospective study may be the only feasible 
approach, since the forward study may prove too expensive to consider 
and the study size required to obtain a respectable number of cases 
completely unmanageable. 

In the absence of important biases in the study setting, the retrospec- 
tive method could be regarded, according to sound statistical theory, as 
the study method of choice. This follows from the much reduced sample 
sizes required by this approach and may be illustrated by the following 
extreme example. If a disease attack rate of 10 per 100,000 among 50 
percent of the population free of some factor were increased tenfold among 
the other half of the population subject to the factor, a retrospective study 
of 100 cases and 100 controls would, with high probability, reveal this 
significantly increased risk. On the other hand, a forward study cover- 
ing 2,000 persons, half with and half without the factor, would almost 
certainly fail to detect a significant difference. For comparable ability 
to find the type of increased risk just indicated, a forward study would 
need to cover about 500 times as many individuals as the corresponding 
retrospective study. The disparity in the required number of persons to 
be studied could, of course, be reduced by lengthening the follow-up period 
for forward studies to increase the experience in terms of person-years 
observed. The larger sample size required for the forward study reflects 
principally the infrequent occurrence of the disease entity under investiga- 
tion. In the example illustrated, uncovering 100 cases of disease in a for- 
ward study would require either 100,000 individuals with the factor or 
1,000,000 without. For diseases with a higher probability of occurrence 
the disparity in required size between retrospective and forward studies 
would be progressively reduced. 

The retrospective study might be looked upon as a natural extension 
of the practice of physicians since the time of Hippocrates, to take case 
histories as an aid to diagnosis. Its guise has varied with respect to the 
means of measuring the prevalence of the suspect factor among diseased 
persons and the criteria for determining unusual departures from normal 
experience. When an association is so marked, as in Percival Pott's 
observations on the representation of chimney sweeps among cases of 
scrotal cancer, no further quantitative data are required to perceive its 
significance. 

The retrospective approach has often been employed in studies of com- 



Joornal of tha National Caaoar Iaatftato 



ANALYSIS OF BETEOSPECTIVE STUDIES 



721 



municable diseases, one illustration being Snow's observations (1) on a 
common water supply for cholera cases in an area served by several sources 
(there would have been no element of unusualness had there been but one 
water supply). When a disease is epidemic in a circumscribed locality, 
the disease-free population in the same area offers a natural contrast. The 
method may be used successfully for endemic diseases as well. Holmes, 
in reac hi ng his conclusions on the communicable nature of puerperal fever 
(2), noted particularly that a large number of women with puerperal fever 
had been attended by the same physicians. In this context it should be 
emphasized that communicable disease investigations have often com- 
bined retrospective and forward study methods. For example, Snow 
supplemented his retrospective observations on water supply by a con- 
trast of cholera rates among subscribers of the Southwark and Vatixhall 
water company with the experience of persons served by the Lambeth 
water company within the same area. 

When a disease occurs sporadically, or its occurrence is not confined to 
a well-defined group (such as women at childbirth), a choice of controls 
is not immediately evident. For cancer and other diseases characterized 
by high fatality rates, a study restricted to decedents might use persons 
dying from other causes as controls. Rigoni Stern adopted this tech- 
nique in deducing the relationship of cancer of the breast and of the 
uterus to pregnancy history (S). Some contemporary studies have also 
used deaths from other causes as controls (4, 6). 

The present-day controlled retrospective studies of cancer date from 
the Lane-Claypon paper on breast cancer published in 1926 (6). This 
report is significant in setting forth procedures for selecting matched 
hospital controls and relating them to a consideration of study objectives. 
Retrospective techniques have since been applied in several investigations 
of cancer, including the following partial list of current references for a 
few primary sites: bladder (7-10), breast (11-18), cervix (18-16), larynx 
(17, 18), leukemia (19), lung (18, 20-27), and stomach (18, 28-80). 

Statisticians have been somewhat reluctant to discuss the analysis of 
data gathered by retrospective techniques, possibly because their train- 
ing emphasizes the importance of defining a universe and specifying rules 
for counting events or drawing samples possessing certain properties. 
To them, proceeding from "effect to cause," with its consequent lack of 
specificity of a study population at risk, seems an unnatural approach. 
Certainly, the retrospective study raises some questions concerning the 
representative nature of the cases and controls in a given situation which 
cannot be completely satisfied by internal examination of any single set 
of data. 

Only a few published papers have treated the statistical aspects of 
retrospective studies. Cornfield discussed the problem in terms of esti- 
mated measures of relative and absolute risks arising from contrasts of 
persons with and without specified characteristics (81). His paper was 
concerned with the simple situation of a homogeneous population of cases 
and controls, presumably alike in all characteristics except the one under 

Vol. 22, No. 4, April 1959 
496001—5© 7 



722 



MANTEL AND HAENSZEL 



investigation, which could be represented by a single contingency table. 
In a later contribution he handled the problem of controlling for other 
variables by adjusting the distribution of controls to the observed dis- 
tribution of cases {16), Dora briefly mentions retrospective studies with 
emphasis on such topics as sources of data, choice of controls, and validity 
of inferences (S#). 

This paper presents a method for computing relative risks for retro- 
spective study contrasts, which controls for the effects of other variables 
by use of the basic statistical principle of subclassification of data. The 
related problem of significance testing is also considered. Since details 
of statistical treatment are conditioned by study objectives, data collec- 
tion methods, choice of a control series, and the use of matched or un- 
matched controls, these topics are also discussed briefly* 

Objectives 

Retrospective studies are relatively inexpensive and can play a valuable 
role as scouting forays to uncover leads on hitherto unknown effects, 
which can then be explored further by other techniques. The effects may 
be novel and not suggested by existing data, as in the pioneer work on the 
association of smoking and lung cancer or the association of blood type 
and gastric cancer, or they may represent refinements of current know- 
ledge. The latter category might include collection of lifetime residence 
and/or work histories to elaborate differences in incidence and mortality 
which appear when some diseases are classified by last place of residence 
or last occupation of the newly diagnosed case or decedent. 

With diseases of low incidence the controlled retrospective study may 
be the only feasible approach. Here emphasis should be placed on 
assembling results from several studies. Before accepting a finding and 
offering an interpretation, scientific caution calls for ascertaining whether 
it can be reproduced by others and in other administrative settings having 
their own peculiar biases. 

A primary goal is to reach the same conclusions in a retrospective study 
as would have been obtained from a forward study, if one had been done. Even 
when observations for a forward study have been collected, a supple- 
mentary retrospective approach to the same body of material may prove 
useful in collecting more data on points not covered in the original study 
design or in amplifying suggestive associations appearing in the initial 
forward-study results. 

The findings of a retrospective study are necessarily in the form of 
statements about associations between diseases and factors, rather than 
about cause and effect relationships. This is due to the inability of the 
retrospective study to distinguish among the possible forms of associa- 
tion — cause and effect, association due to common causes, etc. Similar 
difficulties of interpretation arise in forward studies as well. A forward 
study, to avoid these difficulties, would need to be performed with the 
preciseness of a laboratory experiment. For example, such a study of 
associations with cigarette smoking would require that an investigator 



Journal of Um National Caaaer Iactltata 



ANALYSIS OF RETROSPECTIVE STUDIES 



723 



randomly assign his subjects in advance to the various smoking categories, 
rather than simply note the categories to which they belong. The 
inherent practical difficulties of such an enterprise are evident. 

In addition to the failings shared with the forward study, the retro- 
spective study is further exposed to misleading associations arising from 
the circumstances under which test and control subjects are obtained. 
The retrospective study picks up factors associated with becoming a 
diseased or a disease-free subject, rather than simply factors associated 
with presence or absence of the disease. The difficulties in this regard 
may be most pronounced when the study group represents a cross section 
of patients alive at any time (prevalence), including some who have been 
ill for a long period. Inclusion of the latter may lead to identification 
of items associated with the course of the illness, unrelated to increased 
or decreased risk of developing the disease. The theoretical point has 
been raised that factors conducive to longer survival of patients may be 
found in "prevalence" samples and interpreted erroneously as being 
associated with excess liability to the disease (88). Loopholes of this 
type are minimi zed when investigations are restricted to samples of 
newly diagnosed patients (incidence). 

A partial remedy for these uncertainties lies in employing a conserva- 
tive approach to interpretation of the associations observed. Recognizing 
the ease with which associations may be influenced by extraneous factors, 
the investigator may require not only that the measure of relative risk 
be significantly different from unity but also that it be importantly 
different. He may, for instance, require that the data indicate an 
increased relative risk for a characteristic of at least 50 percent, on the 
assumption that an excess of this magnitude would not arise from extrane- 
ous factors alone. However, the use of such conservative procedures 
emphasizes a corresponding need to pinpoint the disease entity under 
study. A strong relationship between a factor and a disease entity 
might fail to be revealed, if the entity was included in a larger, less well- 
defined, disease category. After the event from data now at hand, 
we know that a study of the association of cigarette smoking with epider- 
moid and undifferentiated pulmonary carcinoma is more revealing 
than an inquiry covering all histologic types of lung cancer. 

Multiple Comparison Problem 

The present-day retrospective study is usually concerned with investi- 
gating a variety of associations with a disease, little effort being involved 
in acquiring, within limits, added information from respondents. The 
results may be analyzed in a number of ways: the various factors may 
be investigated separately, without regard to the other factors; they may 
be investigated in conjunction with each other, a particular conjunction 
being considered a factor in its own right; or, more commonly, a factor 
may be tested with control for the presence or absence of other factors. 
Thus, if the role of cigarette smoking and coffee drinking in a given 
disease are under study, the possible comparisons include the relative 

Vol. 32, No. 4, April 19S9 



724 



MANTEL AND HAENSZEL 



risk of disease for individuals who both smoke and drink as opposed 
to all other persons, or as opposed to those who neither smoke, nor drink 
coffee. In addition, the relative risk associated with smoking might he 
obtained separately for drinkers and nondrinkers of coffee, with a weighted 
average of these two relative risks constituting still another item. Con- 
versely, risks associated with coffee drinking, with adjustments for cigarette 
smoking, could be computed. 

The potential comparisons arising from a comprehensive retrospective 
study can be large. Almost any reasonable level of statistical significance 
used to test a single contrast, when applied to a long series of contrasts, 
will, with a high degree of probability, result in some contrasts testing 
significant, even in the absence of any real associations. The usual 
prescription for coping with this multiple comparison problem— requiring 
individual comparisons to test significant at an extreme probability level 
to reduce the number of associations incorrectly asserted to be true— 
would result only in making real associations difficult to detect. 

However, the multiple comparison problem exists only when inferences 
are to be drawn from a single set of data. If the purpose of the retro- 
spective study is to uncover leads for fuller investigation, it becomes 
clear there is no real multiple significance testing problem— a single 
retrospective study does not yield conclusions, only leads. Also, the 
problem does not exist when several retrospective and other type studies 
are at hand, since the inferences will be based on a collation of evidence, 
the degree of agreement and reproducibility among studies, and their 
consistency with other types of available evidence, and not on the 
findings of a single study. 

Nevertheless, it would be wise to employ testing procedures which do 
not lead to a superabundance of potential clues from any one study. 
This may be achieved by employing nominal significance levels in testing 
factors of primary interest incorporated into the design of an investigation 
and applying more stringent significance tests to comparisons of secondary 
interest or to comparisons suggested by the data. For the usual problem 
of multiple significance testing, this would be equivalent to allocating a 
large part of the desired risk of erroneous acceptance of an association as 
real to a small group of comparisons where fruitful results were anticipated, 
and parceling out the remainder of the available risk to the large bulk of 
comparisons of a more secondary nature. This minimizes the risk of 
diluting, through inclusion of many secondary comparisons, the chances 
for detecting an important primary effect. 

Representative Nature of Data 

The fundamental assumption underlying the analysis of retrospective 
data is that the assembled cases and controls are representative of the 
universe defined for investigation. This obligates the investigator not 
only to examine the data which are the end product but also to go behind 
the scenes and evaluate the forces which have channeled the material to 
his attention, including such items as local practices of referral to special- 



Jonnud of th* National Cum Injtltuta 



ANALYSIS OF RETROSPECTIVE STUDIES 



725 



ists and hospitals and the patient's condition and the effect of these items 
on the probability of diagnosis or hospital admission. We re-emphasize 
that this requires the exercise of judgment on the potential magnitude of 
biases and as to whether they could result in factors seeming to be related 
to a disease, in the absence of a real association of the factor with presence 
or absence of the disease. The danger of bias may be greatest in working 
with material from a single diagnostic source or institution. 

Among the more important practical considerations affecting retro- 
spective studies is that they are ordinarily designed to follow the line of 
least resistance in obtaining case and control histories. This means that 
cases and controls will often be hospital patients rather than persons in 
the general population outside hospitals. As a result, any factor which 
increases the probability that a diseased individual will be hospitalized 
for the disease may mistakenly be found to be associated with the disease. 
For example, Berkson (34) and White (85) have pointed out that positive 
association between two diseases, not present in the general population, 
may be produced when hospital admissions alone are studied, because 
persons with a combination of complaints are more likely to require 
hospital treatment. In theory, bias might also be produced in reverse 
manner, if the suspect factor diminished the probability of hospitalization 
for other diagnoses used as controls. The difficulties are not unique for 
hospital patients. Similar loopholes in interpretation may be advanced 
for any special groups used as sources of cases and controls. 

However, a mere catalogue of biases arising from the possibly un- 
representative nature of a sample of cases and controls should not ipso 
facto invalidate any study findings. This is a substantive issue to be 
resolved on its merits for a specific investigation. Collateral evidence 
may provide information on the potential magnitude of bias and the size 
of spurious associations which could result. In some situations the 
difference between cases and controls may be so great that postulation 
of an unreasonably large bias would be required. Whether he consciously 
recognizes it or not, the investigator must always balance the risks 
confronting him and decide whether it is more important to detect an 
effect, when present, or to reject findings, when they may not reflect the 
true situation. If opportunities for further testing exist, one should not 
be too hasty in rejecting an association as an artifact arising from the 
method of data collection, and in foreclosing exploration of a potentially 
fruitful lead. 

Because of the important role retrospective studies play in studies of 
human genetics, mention may be made of a bias frequently encountered 
in studies dealing with the familial distribution of diseases. A frequently 
used procedure takes a group of diagnosed cases for a disease in question 
and a group of controls and compares the prevalence of this disease among 
relatives of the probands and controls. The bias arises from the unrepre- 
sentative nature of the probands with respect to familial distribution and 
is known in other fields as "the problem of the index case" or "the effect 
of method of ascertainment." It has long been recognized that the 



Vol. 23, No. 4, April 1939 



726 



MANTEL AND HA EN SZ EL 



characteristics for a random sample of families will differ from those for 
families to whom the investigator's attention has been directed because 
the family rosters include individuals selected for study on the basis of a 
specified attribute. For example, data on family size (number of. 
children) obtained from siblings, rather than parents, are biased, since 
two or three potential index cases are present in the population for two- 
and three-child families as opposed to one for one-child families and none 
for childless couples. The analogy for disease occurrence is apparent. 
Families with two or three cases of the disease under study may have 
double or triple the probability of being represented by individuals in 
source material and having a representative selected as a proband than 
families with only one case. An appropriate analysis for this situation 
in studies of family size and birth order has been discussed by Greenwood 
and Yule (86), which takes account of the probability of family repre- 
sentation in proband data. Haenszel (87) has applied their correction 
to gastric-cancer data reported by Videbaek and Mosbech (88) and found 
the correction to reduce the originally reported fourfold excess of gastric 
cancer among relatives of probands, as compared to relatives of controls, 
to one of about 60 percent. 

One remedy for the weakness of the retrospective approach to problems 
involving association of diseases and familial distribution would be to 
place greater reliance on forward observations of defined cohorts for 
data on these topics. 

Controls 

While easier accessibility to and lesser expense of hospital controls are 
important considerations, they should not deter one from collecting con* 
trol data for a sample representing a more general population, if the latter 
are demonstrably superior. Some of the uncertainties about the supe- 
riority of hospital or general population controls arise from the need to 
maintain comparability in responses. The dependence of retrospective 
studies on comparability of responses from cases and controls cannot be 
overemphasized. When more accurate answers can be obtained from 
controls in a medical-care environment, the gain in comparability of 
responses for these controls could outweigh the other advantages to be 
derived from the more representative nature of general population controls. 
The difficulties may be illustrated by the experience with smoking 
histories. Hospital controls invariably yield a higher proportion of 
smokers for each sex than controls of comparable age drawn from the 
general population (27) . Does this mean more complete smoking histories 
are collected in hospitals or does it imply that smokers have higher hospital 
admission rates? If the first alternative is correct, hospital controls are 
the appropriate choice for measuring the association of smoking history 
with a given disease. The second alternative calls for general population 
controls and in this situation the use of hospital controls yields under- 
estimates of the degree of association. 

Dual hospital and general population controls would have some merit. 
If control data from the two sources were in agreement, this would rule 



Jourp«l of tho National Cuw Institute 



ANALYSIS OF RETROSPECTIVE STUDIES 



727 



out some alternative interpretations of the findings. In the event of dis- 
agreement, its extent could be measured and alternate calculations made 
on the degree of association between an event and a suspect antecedent 
characteristic. Where the two sets of controls lead to substantially dif- 
ferent results, a cautious and conservative interpretation is indicated. 

Some topics, such as those bearing on sex practices and use of alcohol, 
may be amenable to study only within a clinical setting, and the collec- 
tion of general population data on these items may prove impractical. 
The limitations of general population controls in this regard may have 
been overstressed, and empirical trials to test what information can be 
collected in household surveys should be encouraged instead of dismissing 
the possibility with no investigation whatsoever. Whelpton and Freed- 
man, for example, have reported some success in collecting histories of 
contraceptive practices in interviews of a random sample of housewives 

(89). 

When hospital controls are chosen, some precautions may be built into 
the study. Within limitations on the nature of controls imposed by a 
study hypothesis, controls drawn from a wide variety of diseases or ad- 
mission diagnoses should be preferred. This permits examination of the 
distribution of the study characteristics among subgroups to check on 
internal consistency or variation among controls. This affords protection 
against two sources of error: a) attributing an association to the disease 
under investigation, when the effect is really linked to the diagnosis from 
which controls were drawn, and b) failure to detect an effect because both 
the study and control diseases are associated with the suspect factor. 
The latter is far from impossible. Both tuberculosis and bronchitis have 
exhibited association with smoking history and the use of one disease or 
the other as a control could easily lead to missing the association with 
smoking history. Similarly, patients with coronary artery disease would 
not constitute suitable controls for a study of the relationship of smoking 
and bladder cancer and vice versa, since the investigator would probably 
conclude that smoking was not related to either disease, when in truth it 
appears related to both. When there is definite evidence that two diseases 
are associated, for example, pernicious anemia and stomach cancer, the 
use of one as a control for the other is contraindicated, unless the study is 
specially designed to elucidate some aspects of the relationship. 

It is always advantageous to include several items in a questionnaire 
for which general population data are available. This could be considered 
a partial substitute for dual hospital and general population controls. 
Disparity among cases, hospital controls, and general population controls 
on several general characteristics unrelated to the study hypothesis may 
be regarded as warning signals of the unrepresentative nature of the 
hospital cases and controls. 

Where possible, interviews should be conducted without knowledge 
of the identity of cases and controls to guard against interviewer bias, 
although administrative reasons will often prevent attainment of ."blind" 
interviews. In cooperative studies employing several interviewers, the 

Vol. 22, No. 4, April 1959 



728 



MANTEL AND HAENS2EL 



magnitude of interviewer bias may be diminished, since it is unlikely 
that all interviewers will share the same bias in concert. In special 
circumstances, such as those prevailing at Roswell Park Memorial 
Institute, admissions may be interviewed before diagnosis, and hence 
before the identity of cases and controls is established. This feature 
requires a comprehensive, general purpose interview routinely admin- 
istered to all admissions, which may restrict its use to publicly supported 
institutions diagnosing and treating neoplastic diseases or other specialized 
disease entities. Several epidemiological contiibutions for specific cancer 
sites have been based on the unique control data available from Roswell 
Park Memorial Institute (9, 11, 12, SO, 40-48), which are particularly 
valuable for collation with studies depending on more conventional 
sources of controls to evaluate interviewer bias and related issues. 

Some patients interviewed as diagnosed cases will subsequently have 
their diagnoses changed. This may be turned to advantage. If scrutiny 
of the data for the erroneously diagnosed group reveals they had histories 
resembling those for the control rather than the case series, as Doll and 
Hill found in their study of smoking and lung cancer (SI), this would 
constitute evidence against interviewer bias. 

In investigations of a cancer site the association of a factor may often 
be restricted to a specific histologic type or a well-defined portion of an 
organ. The finding that epidermoid and undifferentiated pulmonary 
carcinoma is more strongly related to smoking history than adenocar- 
cinoma of the lung is now well established. The range of explanations 
for the observed deficit of epidermoid carcinoma of the cervix in Jewish 
women as compared to other white women is greatly circumscribed by 
the presence of about equal numbers of adenocarcinoma of the corpus in 
both groups. When these finer diagnostic details or their significance are 
unknown to the interviewer, another check on interviewer bias is provided. 
Furthermore, the confirmation in repeated studies of an association 
limited to a specific histologic type or a detailed site will lend credence 
to an etiological interpretation of the association. Repeated confirma- 
tion is an essential element. Otherwise, a very specific association may 
be a reflection of the multiple comparison problem; if enough contrasts 
axe created by fractionation of a single set of data, some apparently 
significant result is likely to appear. For this reason it would be desirable 
to reproduce such provocative results as Wynder's finding that use of 
alcohol was more strongly associated with cancer of the extrinsic larynx 
than of the intrinsic larynx (18), and Billington's report that prepyloric 
and cardiac neoplasms of the stomach were associated with blood group 
A and those located in the fundus with blood group O (44). 

Discussion of matched controls in relation to the analysis and the 
computation of relative risks is deferred to a later section. One con- 
sideration on matched controls arising in the planning and development 
of a study should be mentioned here. Obviously, if the risk of disease 
changes with age an apparent association of the disease with other age- 
related factors may result. Other apparent associations with race, sex, 



Journal of the National Canoar Institute 



ANALYSIS OF RETROSPECTIVE STUDIES 



729 



nativity, etc., may arise in a similar manner. In devising rules for 
selecting controls, those factors known or strongly suspected to be related 
to disease occurrence should be taken into account if unbiased and more 
precise tests of the significance of the factors under investigation are 
desired. A sensible rule is to match those factors, such as age and sex, 
the effect of which may be conceded in advance and for which strong 
evidence is available from other sources, such as mortality data and 
morbidity surveys. When a factor is matched, however, it is eliminated 
as an independent study variable; it can be used only as a control on 
oth er factors. This suggests caution in the amount of matching attempted. 
If the effect of a factor is in doubt, the preferable strategy will be not to 
match but to control it in the statistical analysis. While the logical 
absurdity of attempting to measure an effect for a factor controlled by 
matching must be obvious, it is surprising how often investigators must 
be restrained from attempting this. 

When a minimum of matching is involved, the importance of estab- 
lishing, precisely and in advance, the method by which controls are 
selected for study increases. The rule should be rigid and unambiguous 
to avoid creating effects by subconscious selection and manipulation of 
controls. The problem is similar to that encountered in therapeutic 
trials where a protocol spelling out all the contingencies and actions to 
be taken in advance is, along with random assignment of cases and con- 
trols, the major bulwark against bias. 

To reduce interview time and expense there are advantages in pro- 
cedures for selecting controls which permit a case and the corresponding 
controls to be interviewed in a single session, particularly if travel to 
several institutions is involved. In practice, this favors selecting controls 
from a hospital patient census rather than from hospital admission lists. 
The difficulty with hospital admissions is that there is no guarantee that 
the controls will be available in the hospital at the time the diagnosed 
case is interviewed. This point seems more important than the fact 
that patients with diagnoses requiring long-term stays are overrepresented 
in a current hospital census {46). If the latter is an important issue, it 
may be handled in analysis through subclassification of controls by 
diagnosis. 

Normally there will be little difficulty in reconciling these considera- 
tions into a harmonious set of rules. The items to be matched often 
lend themselves to a procedure for specifying controls. In a recent 
study on female lung cancer we found that the definition of two controls 
as the next older and the next younger women in the same hospital 
service, present on the day the case was interviewed, met the requirements 
just outlined (27). The controls were uniquely defined, the records 
establishing their identity were readily available on the service floor, 
interviews could be completed in one day, and a provision for balancing 
ages of cases and controls was incorporated. Simultaneous interviews 
of cases and controls may be more than an administrative convenience. 
If the prevalence of the associated factor is rapidly shifting over time, 

Vol. 22, No. 4, April 2959 



730 



MANTEL AND HAENSZEIr 



failure to control time of interview could obscure or exaggerate an 
association. 

Some Statistical Tools 

To progress further, questions on the representative nature of the case 
and control series must have been resolved affirmatively. With this 
condition in mind, let us suppose that a controlled retrospective study 
has been conducted and that the number of diseased cases, N x , consists 
of A individuals with the factor being investigated and B free of the 
factor, while the number of controls, N 2 , consists of C individuals with, 
and D individuals without the factor. Let M x = A + C, M 2 = B + D, 
T = Ni + N 2 = M x + M 2 = A + B + C + D. What statistical evi- 
dence is there for the presence of an association and what is an appro- 
priate measure of the strength of the association? 

A commonly employed statistical test of association is the chi-square 
test on the difference between the cases and controls in the proportion of 
individuals having the factor under test. A corrected chi square may be 
calculated routinely as 

(\AD-BC\-%T)*TIN l M i N 2 M 2 

and tested as a chi square with 1 degree of freedom in the usual manner. 

A suggested measure of the strength of the association of the disease 
with the factor is the apparent risk of the disease for those with the 
factor, relative to the risk for those without the factor. Consider that 
a population falls into the four possible categories and in the proportions 
indicated by the following table: 

With factor factor Total 



With disease Pi § £ + ft 

Free of disease P% Pa P% + Pi 



Total ' Pi +P* Pi + Pa 1 

The proportion of persons with the factor having the disease is 
P x /(P t + P 8 ), while the corresponding proportion for those free of the 
factor is P 2 f(P% + ft). Relatively then, the risk of the disease for those 
with the factor is P X (P 2 + P 4 )/ft(Pi + P 3 ). On a sampling basis this 
quantity may be estimated either by drawing a sample of the general 
population and estimating P u P 2} Pz, and P* therefrom or estimating 
P\I(P\ + Pz) Pa/CP* + Pa) separately from samples of persons with, 
and persons free of, the factor. 

It may be noted, however, that if the relative risk as defined equals 
unity, then the quantity PiPJP*Pz will also equal unity. Further, for 
diseases of low incidence where the values for P x and P 2 are small in 
comparison with P 3 and P4 it follows, as has been pointed out by Cornfield 
(81), that P X PJP%P 2 is also a close approximation to the relative risk. 
This latter approximate relative risk can properly be estimated from 
the two sample approaches described or from samples drawn on a retro- 
spective basis; that is, separate samples of persons with, and persons 
free of, the disease. The sample proportions of persons with, and free 

Journal of thm Natlom&l Cane** lutltato 



ANALYSIS OP RETROSPECTIVE STUDIES 



731 



of, the factor in the retrospective approach provide estimates of 
Pif(Pi + P a ) and of P 2 /(Pi + P 2 ) from the sample having the disease and 
of P 3 /(P a + Pa) and of P 4 /(P 3 + Pa) from the disease-free sample. The 
estimate of P x PJP 2 Pz is obtained by appropriate multiplication and 
division of these four quantities. 

Whichever of the three methods of sampling is employed, the estimate 
of the approximate relative risk, P x PJP 2 Pz, reduces simply to AD/BC, 
where A, B, C, and D are defined in the manner stated in the first para- 
graph of this section. Also, the chi-square test of association given, 
which is essentially a test of whether or not the relative risk is unity, is 
equally applicable to all three sampling methods. 

In the foregoing the two basic statistical tools of the epidemiologist 
for retrospective studies, the chi-square significance test and the measure 
of a relative risk, have been described for a relatively simple situation, 
one in which to all intents there is a single homogeneous population. 
The more complex situations confronting the epidemiologist in actual 
practice and the corresponding modifications in the statistical procedures 
will be presented. 

Two other statistical problems may be noted here. One is the deter- 
mination of how large a retrospective study to conduct. This depends 
on how sure we wish to be that the study will yield clear evidence that the 
relative risk is not unity, when it in fact differs from unity to some im- 
portant degree. Application of this statistical technique requires re- 
interpreting a relative risk greater than unity into the corresponding 
difference between the diseased and the disease-free groups in the propor- 
tion of persons with the factor. For example, suppose an attack rate of 
20 percent, given a normal rate of 10 percent, is worth uncovering. Sup- 
pose further that the factor associated with the increased disease rate 
affects 20 percent of the population. The population would then be 
distributed as follows: 

Free of 

With factor factor Total 



With disease Pi«4% P,=8% 12% 

Free of disease P»=16% P 4 =72% 88% 

Total 20% 80% 100% 

The required retrospective study should be large enough to differentiate 
between a 33.3 percent [Pi/(Px + P*)] relative frequency of the factor 
among diseased individuals and an 18.2 percent [Pa/CPa+PO] relative fre- 
quency among disease-free individuals. The usual procedures for deter- 
mining required sample sizes to differentiate between two binomial 
proportions are applicable in this situation. 

While rigorous extension of this procedure to the more complex situa- 
tions to be considered is not too simple, it can readily be adapted to 
secure approximations of the necessary study size. One might, for 
example, start by estimating the over-all required sample size following 
the procedure just indicated for differentiating between two sample 
proportions, assuming that cases and controls are homogeneous with 

Vol. 32, No. 4, April 19S9 



732 



MANTEL AND HAENSZBL 



respect to factors other than the one under investigation. Suppose on an 
over-all basis it is determined that the study should include N x = 200 
disease cases and N 2 = 200 controls, but that the study data will be sub- 
classified for purposes of analysis. Ignoring mathematical complications 
resulting from variations in binomial parameter values within individual 
subclassifications, we may interpret the above values of Ni and N 2 as 
roughly meaning that the total information required for the study is 
NiNt/iNx + N 2 ) = 100. The objective should then be to assign values 
to N u and N 2i to obtain a total score of 100 for the cumulated information 
over all the subclassifications, 2NuNu/(Nu + N 2i ), where N u and N 2i 
are the number of cases and controls in the ith subclassification. 

This formulation of required total information brings out some aspects 
of retrospective study planning which are considered later in this paper. 
For instance, if any N u or N 2i is zero, no information is available from 
that particular category. Much of the benefit of a large Nu (or iVaO in 
any particular category is lost if the corresponding N 2i (or N xi ) is small. 
It is normally desirable to have N u and N^ values commensurate with 
each other; for fixed totals, XN U and XN 2if the total information in an 
investigation will be at a maximum if the degree of crossmatching is equal 
in all subclassifications with a constant case-control ratio of 2Nuf2N 2i . 
Maintaining a fixed case-control ratio among categories need not preclude 
assigning more cases and controls to specific categories. Larger numbers 
may be desired for categories of crucial interest to the study or for cate- 
gories which represent greater segments of the population. 

The information formula also reveals the limits for adjusting the relative 
numbers of diseased and control cases. It shows that if the number of 
controls (N 2 ) becomes indefinitely large, the required Ni value can at most 
be reduced only by a factor of 2. Furthermore, this reduction in required 
diseased cases may be inappropriate if one wishes to obtain clear results 
for the separate subcategories. 

The study size requirements suggested by the information formula may 
be seriously in error if the binomial parameters show excessive variation 
among subcategories. Ordinary precautions, however, should serve to 
keep the formula useful. In some situations it may be desirable to modify 
the information formula indicated above to reflect the contribution due 
to variation in the binomial parameters involved. 

The second statistical procedure involves setting reasonable limits on 
the relative risk when it is in fact different from unity. For the homo- 
geneous case considered, formulas for such limits have been published in 
(46) . The cbi-square test as stated is essentially a test of whether or not 
the confidence limits include unity. Extension of this procedure to more 
complex cases is fairly involved and depends primarily on the measure of 
relative risk adopted. In the absence of a clear justification for any single 
measure of over-all relative risk, the burden of extremely, involved compu- 
tation of confidence limits in such cases would not seem warranted. 
Instead, we feel that emphasis should be directed to obtaining an over-all 
measure of risk, coupled with an over-all test of statistical significance. 



Journal of the National C a n oa r In a tttnf 



ANALYSIS OF RETBOSPECTOVE STUDIES 733 

Statistical Procedures for Factor Control 

A major problem in any epidemiological study is the avoidance of spu- 
rious associations. It has been remarked that where the risk of disease 
changes with age, apparent association of the disease with other age- 
related factors can result. However, there are appropriate statistical 
procedures for controlling those factors known or suspected to be related 
to disease occurrence. They serve not only to remove bias from the 
investigation but, in addition, can add to its precision. 

Two simple procedures for obtaining factor control may first be men- 
tioned. One is simply to restrict the investigation to individuals homo- 
geneous on the factors to be controlled. For this situation the statistical 
procedures already outlined would be appropriate. The potential number 
of individuals available for such a study would, of course, be sharply 
restricted. 

There is also the matching case method. A sample of N diseased 
individuals is drawn and the characteristics of each individual noted with 
respect to the control factors. Subsequently, a sample of N well indi- 
viduals is drawn, with each individual matched on the control factors to 
one of the diseased individuals. The statistical procedures to be presented 
can be shown to cover the matched-sample approach as a special case, 
and a discussion of the analysis of such data will be given in that context. 
Some difiiculties of the matched-sample study may be mentioned here. 
One is that when matching is made on a large number of factors, not even 
the fiction of a random sampling of control individuals can be maintained. 
Instead, one must be grateful Hot each matching control available. 
Another difficulty is that the method cannot be applied to factors under 
control, since diseased and control individuals are identical with respect 
to these factors. Conversely, factors under study in matched samples 
cannot themselves be controlled statistically. They can be analyzed 
separately or in particular conjunctions but cannot be employed as control 
factors. 

An alternative to case matching is to draw independent samples of 
cases and controls, and adjust for other factors in the analysis. This 
approach requires simply the classification of individuals according to the 
various control and study factors desired, and an analysis for each separate 
subclassification as well as an appropriate summary analysis. Its success 
will depend on a reasonable degree of cross-matching between observations 
on diseased and control persons. In a small study various devices for 
reducing the number of subclassifications and for increasing the chances of 
cross-matching may be necessary, including a limit on the number of 
factors on which individuals are classified in any one analysis and the use 
of broad categories for any particular classification. Thus, a 10-year 
interval for age classification might permit a reasonable degree of cross- 
matching, whereas a 1-month interval would not. 

The need for some degree of deliberate matching, even when the 
classification approach is employed, can be seen. If the disease under 
consideration occurs at advanced ages, little cross-matching would result 



Vol. SO, No. 4, April 1959 



734 



MANTEL AND HABNSZEL 



if controls were selected from the general population. The remedy lies in 
deliberately selecting controls from the same age groups anticipated for 
persons with the disease, perhaps even matching one or more controls on 
age for each diseased person. This principle can be extended to matching 
on several control factors, solely jot the purpose of increasing the extent of 
cross-matching in the analysis. 

One of the subtle effects which can occur in a retrospective study, even 
with careful planning, may be pointed out. It can be shown, for instance, 
that within a given age interval the average age of individuals with cancer 
of certain sites will be greater than the average age of individuals from the 
general population in the same age interval. This can arise when incidence 
increases rapidly with age and may pose a serious problem with broad age 
intervals. This effect can be offset by close matching of cases and controls 
on age in drawing samples, even though they are classified by a broad age 
category in the analysis. 

When a random sample of diseased and disease-free individuals is 
classified according to various control factors the distribution of the factor 
under study within the ith classification may be represented as follows: 

Free of 

With factor factor Total 



With disease Ai B< Nu 

Free of disease d D< N u 

Total M u M u T { 

Within this subgroup the approximate relative risk associated with the 
disease may be written as AiDJBid. One may compare the observed 
number of diseased persons having the factor, A {f with its expectation 
under the hypothesis of a relative risk of unity, E{A^NiMulTt. 
The discrepancy between A t and E(A t ) (which is also the discrepancy for 
any other cell within a 2 X 2 table) can be tested relative to its variance 
which, subject to the fixed marginal totals — Nu, N 2 t, M u , and M 2i — is 
given by V(A t ) = NMiMiMii/TftTt-l). The corrected chi square 
with 1 degree of freedom (\At-E(Ai)\ -%) 2 /VW reduces in this case to 
(\AtDt -BiCi\ - %Ti)\Tt - \)lN u NiMiM %i . This formula for the variance 
of Ai is obtained as the variance of the binomial variable N X PQ(P = MJT, 
Q « Mt/T), multiplied by a finite population correction factor (T-N t )l 
(T-l) =a N 2 f(T- 1). The earlier chi-square formula, which is ordinarily 
used, essentially employs a finite population correction factor of N%jT. 

There is thus a difference between the two chi-square formulas of a 
factor of (T-l)/T which, though trivial for any single significance test 
with respectably large T, can become important in the over-all signifi- 
cance test. It is with the latter formula, just presented, that chi square 
is computed as the ratio of the square of a deviation from its expected 
value to its variance. 

The adjustment for control factors is at this point resolved for the result- 
ing separate subclassifications. The problem of over-all measures of 
relative risk and statistical significance still remains. A reasonable over-all 



Journal of the National Canoox Iudtnte 



ANALYSIS OF EETKOSPECTIVB STUDIES 



735 



significance test which has power for alternative hypotheses, where there 
is a consistent association in the same direction over the various sub- 
classifications between the disease and a study factor, is provided by 
relating the summation of the discrepancy between observation and 
expectation to its variance. The corrected chi square with 1 degree of 
freedom then becomes {\ZAt-ZE(Ad\-fflPV<Ad where E ( A <) md 
V(Ai) are defined as above. 

The specification of a summary estimate of the relative risk associated 
with a factor is not so readily resolved as that for an over-all significance 
test, and involves consideration of alternate approaches to a weighted 
average of the approximate relative risks for each subclassification 
(AiDt/BiCt). If one could assume that the increased relative risk associ- 
ated with a factor was constant over all subclassifications, the estimation 
problem would reduce to weighting the several subclassification estimates 
according to their respective precisions. The complex maximum likeli- 
hood iterative procedure necessary for obtaining such a weighted estimate 
would seem to be unjustified, since the assumption of a constant relative 
risk can be discarded as usually untenable. 

Another possible criterion for obtaining a summary estimate of relative 
risk would involve weighting the risks for subclassification by "impor- 
tance." A twofold increase of a large risk is more important than a 
twofold increase of a small risk. An increased risk for a large group is 
more important than one for a small group. An increased risk for young 
individuals may be more important than for older individuals with a 
shorter life expectation. Difficulties arise in attempts to weight relative 
risk by measures of importance. For one, the necessary information on 
importance, in terms of the size of the populations affected or in terms 
of the absolute level of rates prevailing in the subgroups, is generally not 
contained within the scope of the investigation. A problem in definition 
of the precise terms of the weighted comparison also appears. Does 
one want to adjust the risks of disease among persons with the factor to 
the distribution of the population without the factor, or vice versa, or 
adjust the risks for the populations with and without the factor to a 
combined standard population? These procedures, and the different 
phrasing of the comparisons which they entail, could yield different 
answers. If only a small proportion of the population with the factor was 
in a subcategory with a high relative risk, while most of the factor-free 
population fell into this subcategory, and in other categories the relative 
risk associated with the factor was less than unity, the factor would appear 
to exert a protective influence under one set of weights but a harmful 
effect under the other. 

Published instances of summary relative risks do not fall clearly into 
either of the two categories— weighting by precision or weighting by 
importance. They do follow an approach usually employed in age-adjust- 
ing mortality data. Since the relative risk for a single 2X2 table ca,n be 
obtained from the incidence of the factor among diseased and well indi- 
viduals, the problem would appear translatable into terms of obtaining 



Vol. a*. No. 4, April 1959 



736 



MANTEL AND HAENSZEL 



over-all, category-adjusted incidence figures. Direct or indirect methods 
of adjustment can be used, employing as a standard of reference the fre- 
quency distribution or rates corresponding to the sample of diseased 
persons, of controls, or the diseased persons and controls combined. 

While such adjustment procedures provide weighting by importance 
in their customary application to mortality rates, this is not so in the 
relative risk situation. This may be illustrated in the following extreme 
example. Suppose that in each of two subcategories the approximate 
relative risk for a contrast between the presence and absence of a factor 
is about 5, which arises in the first subcategory from contrasting per- 
centages of 1 and 5, and in the second subcategory from contrasting per- 
centages of 95 and 99. If these percentages were based on equal numbers 
of individuals, all methods of category adjusting would yield contrasting 
adjusted summary percentages of 46 and 52, and a resultant relative risk 
of slightly less than 1.3. Some other approach for obtaining category- 
adjusted relative risks would seem desirable. However, to the extent 
that such extreme situations are not encountered in actual practice, results 
based on these more conventional adjustment procedures will not be 
grossly in error. 

A suggested compromise formula for over-all relative risk is given by 
B = ^{AiDJTi)l^(BiCilTi). As a weighted average of relative risks 
this formula would, in the illustration given, yield the over-all relative 
risk of 5 found in each of the two subcategories. The weights are of the 
order NxJti%J{Nu + N*t) &ad &s su <^h can be considered to weight approxi- 
mately according to the precision of the relative risks for each subcategory. 
The weights can also be regarded as providing a reasonable weighting 
by importance. 

An interesting property of this summary relative risk formula is that it 
equals unity only when 2^4 < = 2E(A { ) and hence the corresponding 
chi square is zero. From the fact that E(A t ) = (AiDt—BiCi)fT i9 
it follows that when 2A t — VE{A t ), 2A t Di/Ti will equal XBtCt/Tt, chi 
square will be zero, and JS will be unity. The chi-square significance test 
can thus be construed as a significance test of the departure of B from unity. 

Of some other procedures for measuring over-all relative risks, the one 
following also has the interesting property of being equal to unity when 
2(4,) = 2E(Ai) and therefore subject to the chi-square test: 

- NxMnlTu E(Ct) = NMJTu and E(D { ) « NM1/T+ 

In this formula the numerator represents the crude value for the relative 
risk, which would result from pooling the data into one table and ignoring 
all subclassification on other factors. The denominator represents the 
crude value for relative risk, which would have resulted from pooling in 
the situation where all relative risks within each subclassification were 
exactly unity. Readers familiar with the "indirect" method of com- 



Joura&l of 111* National Cantor Institute 



ANALYSIS OF RETROSPECTIVE STTJME3 



737 



puting standardized mortality ratios will recognize an analogy between 
the "indirect" method and the above procedure. 

The estimator R x can be seen to have a bias toward unity. One reason 
is covered by the illustration which indicated that adjusted percentages 
(or frequencies) do not yield an appropriate adjusted relative risk. In 
addition, when either cases or controls have little representation in a 
subcategory, there will be lack of cross-matching and little information 
about relative risk, and the observed cell frequencies and their expecta- 
tions will be numerically close. Such results will, in the process of sum- 
mation used by the estimator, tend to force its value toward unity. This 
weakness will not be too important if the degree of cross-matching is 
roughly equal in the various subclassifications — an optimum goal one 
would normally attempt to achieve. The bias will become more pro- 
nounced as the number of control factors increases and as the prospects 
for good cross-matching become poorer. 

We used the estimator R x in a recent paper (87), knowing its potential 
weaknesses. This was done to present results more nearly comparable 
with those reported by other investigators using similarly biased esti- 
mators. One set of results from this paper on lung cancer among women 
illustrates the conservative behavior of estimator R x compared with R } as 
additional factors are controlled. The relative risk (R x ) for epidermoid 
and undifferentiated pulmonary carcinoma associated with smoking more 
than one pack of cigarettes daily as compared to nonsmokers decreased 
from 7.1 (controlled for age) to 5.6 (controlled for age and coffee consump- 
tion). The corresponding figures, with J? as a measure of relative risk, 
were 9.7 and 9.9. 

Computational procedures for R and R x are presented in table 1, drawing 
on material comparing smoking histories of women diagnosed as cases of 
epidermoid and undifferentiated pulmonary carcinoma with those of female 
controls. For simplicity in presentation only two smoking levels are con- 
sidered — nonsmokers and smokers of more than one pack of cigarettes 
daily. An extension of the significance testing procedures to the case of 
study factors at more than two levels is discussed later. The control 
factors are age and occupation. The basic data are given in the first 9 
columns. Columns 10 and 11 carry the derivative calculations required 
for R. Columns 12 and 13 are used in the computation for R x and for 
the variance estimate in column 14 — the latter being needed for the chi- 
square test. Only columns 1 to 10, 12, and 14 would be necessary to 
compute chi square, R and R\. Column 13 is not essential for the com- 
putation of E(D) but simplifies computation of V(A) } while providing a 
check on E(A).. Column 11 serves as a check on 10 and 12. A system 
of checks and computations is outlined at the bottom of table 1. Not all 
the computations shown would ordinarily be necessary for an analysis. 

The corrected chi-square value of 30.66 (1 degree of freedom) would 
indicate a highly significant association between epidermoid and undif- 
ferentiated pulmonary carcinoma and cigarette smoking in women, after 
adjusting for possible effects connected with age or occupation. The 

Vol. 22, No. 4 f April 1959 
496001—59 8 



738 



MANTEL AND HAENSZBL 
Table l.-IUustratin ******** JSKKSKS 




Checks: Total discrepancy, Y, = ZA- S|U - | 

Z™ADm-Z(BCm 
2(16) = 64.000;JB(3) = 64 



f¥i - 2(10) - 



11.625 
11.625 
2(11) - 



11.625 



§17) + 2(18 = 249.000; 2(6) = 249 
Derivative cVmputations: XE{B) - 2(2) 



+ K = 57.625 
+ F = 24.625 
2(1) + 2(17) - 94.960 
Z(BT/zV,) = 2 2 + |jl| - 2J8-040 
ZiCTINl = 2(4) + 2(15) - 16.325 
zlDTINj = 2(5) + 2(16) 



■ZE&) = 2(4) 



= 296.675 



i «f 7? imnUes that the risk of these cancers is 10.7 times as great for 
value of R implies una* ww 110 . . d than for women who 
women currently »N» ««f f 1 *?£Sentical with tie 

— -■*«*•, J^hl retlt torn £d£ the data with .0 
crude relative ™k, 7.10, wb*h ^ m the bBshed fl, 

"r^a fm^^'^^ lhe J— * 

3to fo™ cuTnUy emoldn g 1 pack a day or lee. and for occa- 
sional or discontinued ^^ Mr ^ Mtimate9 o{ riek is 
The «V"»"f ^^ZSatiye computation, required 
also outlined in table 1. ah three estimate, are 
for tins purpose appeal ™ f^Z^ ^JLuTthat is, til. use of a 
SS^ SSStat^SS. cise and control dietributiona are 

Journal of the National Can oar iBJtltat* 



ANALYSIS OF RETROSPECTIVE STUDIES 

relative risk <R, Ri, R* Ri, and R*) relating to the association of epidermoid and 
in women with smoking history 



739 



Derivative computations 



AD 
T 

(9) 


BC 
T 
(2) (4; 


E(A) 


E(D) 
(8) (Si 


V(A) 
(12) (13) 


NiC 
N, 

(3) (4) 


NiD 
N, 

(3) (5) 


N,A 
N, 
(D(6) 


NaB 

(2) (6) 


(9) 


(9) 


(9) 


(9)-1.0 


(6) 


(6) 


(3) 


(3) 


(10) 


(ID 


(12) 


(13) 


(14) 


(15) 


(16) 


(17) 


(18) 


0 

1.500 
2. 534 
0 


o 

0. 156 

0 

0 


0 

0.656 
0.466 
0 


7.000 
22. 656 
46. 466 
42.000 


0 

0.480 
0.380 
0 


0 

0. 280 

0 

0 


2.000 
6. 720 
9. 000 
11. 000 


0 

7. 143 
16. 333 
0 


7.000 
17. 857 
32. 667 
42.000 


1. 636 
1. 500 
1.484 
0 


0 

0. 167 
0.258 
0.333 


1. 364 
0.667 
0.774 
0.333 


4.364 
16. 667 
21. 774 
11. 333 


0. 595 
0. 483 
0. 562 
0. 222 


0.750 
0.400 
0.480 
0.500 


2. 250 

3. 600 
5. 520 
5. 500 


a 000 
10.000 
& 333 
0 


0 

10.000 
16. 667 
12.000 


0.714 
2.667 
0 

a 790 


0 

0. 056 
0.231 
0 


a 286 
1.389 
a 231 
a 211 


9. 286 
9. 389 
19. 231 
14 211 


a 204 
0.767 
0. 178 
0. 166 


0. 231 
0.385 
0. 300 
0 


.769 
4 615 
5. 700 
4 000 


13. 000 
10.400 
0 

3. 750 


0 

2.600 
20. 000 
11. 250 


12.825 


1. 201 


6.375 


224 375 


4 036 


3. 325 


6a 675 


76. 960 


172. 040 



Ri 



adjustment factor, 

= 1.0081 
fi l = r// = 7.05 



Ki = 



7.14 
8.12 



Z awea ehown are rounded from those actually calculated and consequently are 
wt fX c^ntist^r ColuZ totals and figures shown do not necessarily agree. 

adjusted If the distribution of diseased cases is taken as the standard 
distribution to which the controls are adjusted, the estimator becomes 

Estimator ft was used by Wynder et <d. in a study of the association of 
cervical cancer in women with circumcision status of sex partners {.16). 
The merit of employing the cervical cancer case-distribution as the stand- 
ard presumably rests on the fact that this distribution at least would be 
well defined by the study. 



Vol. SI, Ko. 4, April 19S9 



740 



MANTEL AND HAENSZEL 



If the distribution of control cases is taken as standard the estimator 
becomes 

2 ( B ,xf;)zc 

If the combined distribution is taken as standard the estimator becomes 




If any N u or N 2i should equal zero, the estimator R 4 would not be 
defined, JB 3 is not defined for any zero-valued N 2i} and R* is not defined for 
any zero-valued N u . In these instances it would be necessary to exclude 
the zero-frequency categories to define the estimators. The estimator R x 
retains these categories at the expense of greater bias toward unity. The 
estimator R gives such categories zero weight, since they contain no 
information about relative risk. The chi-square significance test gives 
no weight to these categories. 

While Ra is clearly a direct adjusted estimate of relative risk employing 
the combined distribution as standard, i? 2 and i? a may be viewed alter- 
natively as either direct or indirect adjusted estimates. The same esti- 
mates will result if a direct adjustment is made using the distribution of 
cases as standard, or an indirect adjustment is made using the factor 
incidence rates for controls as the standard rates. 

It may be noted that in the example used, the values for R 2y i? 8 , and i?< 
(7.14, 8.12, and 7.91, respectively) were roughly comparable to R u and 
all were smaller than R. The example was selected because all the N u 
and Nit values were non-zero, so that the values of i? 2 , 2?s, and JJ 4 were 
all defined. 

The over-all relative risk estimates are averages and as averages may 
conceal substantial variation in the magnitudes of the relative risk among 
subgroups. Ordinarily, the individual subcategory data should be ex- 
amined, paying special attention to relative risks based on reasonably 
large sample sizes. This will provide protection against the potential 
deficiencies of any particular summary relative risk formula employed. 
The over-all chi-square significance test in any case will remain appropriate 
for detecting any strong general tendency for the risk of disease to be 
associated with the presence or absence of the test factor. 

The Matched-Sample Study 

The matched-sample study previously described can be considered a 
special case of the classification procedure with the number of classi- 
fications equal to the number of pairs of individuals. The status of pairs 
of well and diseased individuals classified with respect to the presence or 
absence of the suspect factor in each individual will be represented as 



Journal of the National Ganeer Institute 



ANALYSIS OP RETROSPECTIVE STUDIES 



741 



F, 0 } fif, or J in the following fourfold table. The meanings attached to 
the marginal totals A y B } C } and D are the same as those in the first 
schematic representation. 

Diseased individuals 

Well individuals With factor Free of factor Total 

With factor F 0 C 

Free of factor H J D 

Total A B N 

In the absence of association between the disease and the factor, we 
expect the same number of individuals with the factor to appear among 
both diseased and well individuals; that is, we expect A(=F + H) to 
equal C(=F + 0). This can occur only when 0 = H and the statistical 
test is simply whether or not 0 differs significantly from 50 percent of 
0 + H. 0 is tested as a binomial variable with parameter K, G + H 
being the number of cases. 0 thus has expectation %{0 + H), variance 
y 4 (0 + H) and the corrected chi square with 1 degree of freedom can 
readily be shown to reduce to (\G - H\ -1)7(# + 

Treating the data as consisting of N classifications each with N u = 
N u = \ f T { = 2 and applying the previously described procedures will 
lead to the same value of chi square. For F of the N classifications, 
Ai = 1, M u = 2, M u = 0, E(Ai) = 1, V(At) = 0; for G classifications 
A { — 0, M u = M 2i - 1, E(A t ) = K, V(A t ) = %; for H classifications 
At = 1, M H = M a< = 1, E(Ai) = = %; and for J classifications, 

A { - 0, M u - 0, M 2i = 2, = 0, F(4 4 ) = 0. Thus, XA t — F + H, 

ZE(At) = F + %(G + H), 2V(A t ) = Yi(Q + fl), and the resultant cor- 
rected chi square can again be seen to be (\Q-H\ — lYf(G + H). 

It is of interest to observe that the summary chi-square formula is 
appropriate in the matched-sample case, even though the frequencies for 
each of the separate subclassifications are small. Its appropriateness, 
despite the small frequencies, stems from the fact that it is a test on a 
summation of random variables, At, and thus tends to approach normality 
rapidly, making the chi-square test valid, even though the individual 
^'s are not normally distributed. This property of the chi-square 
formula applies in the general classification as well as the matched-sample 
situation. Only substantial lack of cross-matching in the general case 
would tend to make the chi-square test invalid. It is also essential, of 
course, that there be some appreciable variation in the presence or absence 
of the factor under study. 

It should be noted that in the matched-sample study with T t = 2 for 
each of the N pairs of individuals, the variances of the A/s would have 
been understated by a factor of 2, had T — 1 been replaced by Tin the 
variance formulas. The usual formula for chi square does essentially 
make this replacement, but it is usually of little consequence if T is of 
any reasonable magnitude. The formulas for relative risk in the matched- 
sample study reduce simply to the following: R = H/G; R x «= = R z = 
R< = AD I BO. 

Vol. 22, No. 4, April 1949 



742 MANTEL AND HABNSZBI/ 

Study Factors at More Than Two Levels 
The preceding discussion on the analysis of retrospective data has been 
in terms of the test factor under study taking only two values This 
framework has sufficed for discussion of the underlying statistical ideas 
and issues. In practice, the study factor will frequently take on more 
than two, perhaps many, potential values. When the number of study 
factor values is large, grouping can reduce them to manageable 

^The n°eed to consider only a limited number of classes for the study 
factor stems from the fact that, when an association is anticipated, most 
of the significant information about the association will comefrom the 
results for the more extreme values of the study factor. While it is 
efficient to concentrate attention on the test factor classes expected to 
show the greatest differences in association with the disease, it is also 
profitable to consider intermediate values for the test factor to seek 
evidence for a consistent pattern of association. For example, in table 1, 
a highly significant difference between nonsmokers and women currently 
smoking more than 1 pack of cigarettes daily was illustrated. Inclusion 
of data for smokers of 1 pack or less a day showing results mtennediate 
between the other classes would have added little, if anything, to the 
statistical significance of the results, and might actually lower it, if one 
made an over-all test of the differences among the three smoking classes. 
However, the observation that the intermediate smoking class does, in 
fact show an intermediate relative risk contributes to an orderly pattern 
and'increases our confidence in the conclusions suggested by the data for 
the remaining two classes. m 

For any two particular test-factor levels, the relative risk for one over 
the other may be calculated using only the data pertaining to those two 
levels or by using the results for all test levels. In the formulas previ- 
ously given for ft ft, ft, R„ and ft, the difference between the two 
calculating procedures is simply one of setting the values of N lt , Nu, and 
T = N + N it m terms of number of cases and controls occurring at 
the two study-factor levels only, or defining them in terms of total number 
of cases and controls in the entire study. When total cases and controls 
are used in denning N u , N 2i , and T„ it can be shown that for ft, ft, ft, 
and ft the various relative risks will be internally consistent with each 
other. If the relative risk for the first level is twice that for the second 
level which in turn is twice that for the third level, then the relative risk 
for the first level will be four times that of the third. These exact rela- 
tionships do not hold for R as an estimator of relative risk, and a somewhat 
sophisticated extension of the formula for R would be required to secure 

this property. , . , . . 

The problem of obtaining a summary chi square when the study factor 
is at more than two levels is complicated by the fact that the deviations 
from expectation at the various study-factor levels are mtercorrelated. 
When there are but two levels, the two deviations will have perfect nega- 
tive correlation, and attention need be directed to only one of the devia- 

Joonul of Uw Narioaal Onoer InjdtuU 



ANALYSIS OF RETROSPECTIVE STUDIES 



743 



tions Irrespective of the number of levels, at any one level the deviation 
from expectation among diseased persons will be equal, but opposite in 
sign, to the deviation from expectation among controls, so that attention 
can 'be confined to the deviations for diseased persons. 

The problem can be stated as one of reducing a set of correlated devia- 
tions into a summary chi square. Table 2 applies this process for obtain- 
ing a summary chi square to the study of the association of epidermoid 
and undifferentiated pulmonary carcinoma in women and maximum 
cigarette-smoking rate, classified into three levels, after adjustment for 

age and occupation. . , 

The general expressions for the expectations and variances of the 
number of cases at a particular test-factor level are given in the lower 
right section of table 2. Also shown is the expression for the covariance 
between the number of cases at two different test-factor levels Since 
the total of all the deviations is zero, one would in general need the vari- 
ances of, and covariances between, the number of cases at all but one of 
the levels The number of covariance terms will rise sharply as the 
number of test levels are increased. At 3 test levels there are 2 variance 
terms and 1 covariance term, while at 10 test levels, there would be 9 
variances and 36 covariance terms of interest. 

For the general case the burden of computation could be heavy. After 
all the necessary computation for the deviations, their variances and 
covariances, there would still remain the problem of converting these, 
presumably by matrix methods, into a suirmary cm square. Since the 
retrospective problem will normally involve only a limited number of 
test-factor levels, precise procedures will be given only for the three-level 
situation, and approximate procedures outUned for the general case 

The exact computation procedure for the three-level case is detaJed 
in table 2. lines (1), (2), and (4) show the total observed and expected 
frequencies and variances of the number of cases (and consols) at each 
of {he three smoking-rate levels, after adjustmg for age ^ ""P*^ 
These are the summary totals over each subclassification obtained by 
application of the formulas appearing in table 2. 

Lines (5) and (6) give the chi squares corresponding to the total devia- 
tio^m expectation at each of the smoking-rate levels. The chi squares 
Thne (5) Ire corrected for continuity. They relate to the difference 
of the particular level to which they apply, from the two other levds 
combined. Following the usual practice of making no continuity cor- 
rections when chi squares with more than 1 degree of freedom are under 
consideration, line (6) shows the uncorrected chi squares 

The computing procedure of table 2 takes advantage of the fact that, 
since the sum of the deviations from expectation is zero the variance 
of the third deviation must equal the sum of the other two variances 
plus twice the covariance for the first two deviations. The covariance 
oftiie first two deviations is readily obtained as illustrated and is used 
in calculating the summary chi square. The summary cm square is 
obtained as the sum of squares of two orthogonal deviates, with each 



Vol. 22. No. 4, April 1959 



744 



MANTEL AND HAENSZEL 



o 



5 o< 



O 
CO 
CO 



CO 
CO 



94)94 



5 

o 

e 

QQ 

a 
o 
c 



-a 

o 



o 
O 



8 



s 

bfl 

o 

W pal 

m 
o 

s 



3 

+ 



8 



O 



.2 8 > 
3 a** o 

Oi «3 « © 



3 3 



s 5**5 w-S II J 



c3 +^ 

II 
it 



a) 



® "3 ~- « 

85 l 



. •» 00 o o 

§ S-2 8 

S3+» OS 

* 2 © 

y C CO 

5 £ 8 H 
c go ©2 



»^ 11b 

2« si 



s 

co 



8 



SI 



«3 
g 

Q« d 12 *■ » o 



•33 



J! 



U5 



S 

CO 
CM 



10 



05 

CO 



06 

7 



CO ,1 II 

S S S 
3 8 S 



CO 



9 0 



1,11111 



3>s 



CO 



CD 



CO 



CO 
CO 



C5 

o 



CM 
00 
+ 



II h r 
I » » 

« a a 



CN 



+ 



co 

CO 

°? CO 

10 1-4 ^ 



R 

II 

s 



MM 

X 

£ 00 it 

CO 

o O CO CO 
iO 00 



00 

C4 

11 

CO 
00 



o 

CO 



; ^1 M »H CO 

I r- I r— 1 r-l 1— t 



"8 

> 



O 



si 



So 
0-3.2 

c Sal % 



s : 

(N u- 

1 1 

•er^ o 

e 

•a g.s 

sis 



0^5 

00 ft, 



o o ft 



3 « 



sill 



£ S°| 

*~ CT3'3 
TJ C S C 

0} <C 08 w 



•T3 • 

. -4-3 

. o , 

•2 • 

^8 : 

M 3 

oS 1 " 1 co 

03 w O 



I 

cr 

oq 



11 

SI 




loonwl Of Ibo National Cancer IutituU 



ANALYSIS OF RETROSPECTIVE STUDIES 



745 



square adjusted for its own variance. The first deviate squared is simply 
the uncorrected chi square at the first level in line (6)— the variance of 
the deviate remaining as initially calculated. The second deviate is the 
deviation at the second level adjusted for its correlation with the first 
deviation [adjusted Y 2 = Y 2 -b 2 iY x ; b 2l = covariance (F^/variance 
Ft)]. The variance of the adjusted second deviate is the initial value 
reduced by that portion of the variation accounted for by the first devi- 
ation [Var. (adjusted Y 2 ) = variance r 2 -covariance 2 (F 1 ,F 2 )/variance 
Yi)]. 

In the present instance the summary chi square with 2 degrees of 
freedom is 28.43 [line (11)]. This presumably is close to the chi square 
with 1 degree of freedom which would have obtained had only the two 
most extreme smoking classes been compared. If one examines the 
individual uncorrected chi squares [line (6)], their total is found to be 
45,55, the maximum individual figure being 23.42. It will necessarily be 
trite that the summary chi-square value will lie between the largest of the three 
chi squares and their total. At almost any reasonable probability level these 
limits would be sufficient to establish statistical significance without further 
calculation. In our companion paper (27) this rule sufficed in almost all 
instances to separate the significant from the nonsignificant results. 

Comments on Extensions to More Than Three Factors 

Two procedures can be suggested for getting approximate summary 
chi squares, when there are a large number of levels for the test factors, 
without the burden of computation that the exact method would entail. 
Both methods calculate the approximate summary chi square as a sum of 
squares of approximately orthogonal standardized deviates. 

In the first method one computes an uncorrected chi square w' ih 1 degree 
of freedom for the difference of the first level from all the remaining levels 
combined (the same first step as in the illustration for the three-level case). 
Discarding the data from the first level, a second chi square is computed 
for the difference between the second test-factor level and the remainin g 
levels combined. This is done successively up to and including the last 
two remaining levels. The approximate summary chi square is then the 
sum of the separate chi squares with the number of degrees of freedom 
being one less than the number of test levels. 

Exactly orthogonal standardized deviates would be obtained if, in the 
summary analysis, as each successive total deviation from expectation 
were evaluated, it was adjusted for its multiple regression on the preceding 
deviations, and then standardized by the adjusted variance. This, of 
course, would no longer be a simplified approximate procedure. However, 
it can be shown that for a single classification, in the multiple regression of 
any deviation from expectation on any subset of deviations, the regression 
coefficients will all be equal; the multiple regression on the set of deviations 
will be the same as the simple regression on their sum. The equality of 
regression coefficients, while holding true exactly for deviations in the 
separate subdassifications, will hold only approximately for the total 



Vol. 33, No. 4, April 1959 



746 



MANTEL AND HAENBZEL 



deviations from expectation (it would hold exactly if equal numbers of 
individuals were observed from level to level at each subclassification). 
Nevertheless, this result suggests that approximately orthogonal deviates 
would be obtained if, in evaluating each successive total deviation, it were 
adjusted for the cumulative total of deviations already evaluated. Com- 
puting procedures to accomplish this can readily be devised. 

Both approximate chi-square procedures just outlined, which may have 
merit when more than three groups are being compared simultaneously, 
should, in theory, yield linear combinations of independent chi squares. 
While testing the chi-square values obtained as though they were exact 
is not likely to be too inappropriate, it may be more correct to obtain a 
modified number of degrees of freedom, along the lines suggested by 
Satterthwaite (47) for problems involving such linear combinations. 
What the modified number of degrees of freedom would be has not been 
investigated by us, and it may prove as easy to apply the exact chi-square 
procedure, indicated later, as to determine the appropriate degrees of 
freedom for the approximate chi square. 

It is of interest that a somewhat similar task of obtaining an appropriate 
summary chi square appears in the birth-order problems described by 
Halperin (48). There, it was necessary to compare a set of total observa- 
tions (across family sizes) with a set of total expectations, one for each 
birth order. Halperin described a matrix-inversion procedure for reducing 
the set of correlated deviations into a summary chi square. In that 
problem it can be shown that all the regression coefficients are equal in 
the multiple regression of the deviation at a particular birth order on the 
set of deviations at all succeeding birth orders. The second approximate 
method described previously for the present problem could thus be used 
exactly for the birth-order problem, permitting simplified computation of 
chi square. The procedure indicated by Halperin has the advantage of 
generality and could be applied to the current and related problems, if 
one obtained all the necessary variances and covariances and inverted 
the resulting matrix. 

References 

(1) Snow, J.: On the mode of communication of cholera. In Snow on Cholera. 

New York, The Commonwealth Fund, 1936, pp. 1-139. 
(£) Holmes, O. W.: The contagiousness of puerperal fever. In Medical Classics. 

Baltimore, Williams & Wilkins Co., vol. 1, 1936, pp. 211-243. 
(5) Stehn, R.: Nota suUe ricerche del dottore Tanchon intorno la frequenza del 

cancro. Annali Universali di Medicina 110: 484-503, 1844. 

(4) Stocks, P., and Campbell, J. M.: Lung cancer death rates among non-smokera 

and pipe and cigarette smokers. Brit. M. J. 2: 923-929, 1955. 

(5) Wtndbb, E. L., and Coanfibld, J.: Cancer of the lung in physicians. New 

England J. Med. 248: 441-444, 1953. 

(6) Lanb-Clatpon, J. E.: A further report on cancer of the breast, with special 

reference to its associated antecedent conditions. Kept. Publ. Health & M. 
SubJ., No. 32, 1926, pp. 1-189. 

(7) Clemmesen, J., Locxwood, K., and Nielsen, A.: Smoking habits of patients 

with papilloma of urinary bladder. Danish M. Bull. 5: 123-128, 1958. 
(£) Denoix, P. R., and Schwabtz, D.: Tobacco and cancer of the bladder. (Bulletin 
de L' Association francaise pour 1'eiude du Cancer.) Cancer 43: 387-393, 1956. 

Journal of thm National Canon Instifnto 



ANALYSIS OF RETROSPECTIVE STUDIES 747 

a "K/t t ttvtn M L and Moore, G. E.: The association of smoking 
m 1 ^^Jt^^SSt^w^ A.M.A.Aroh.I.t.Med.,195;. 



(10) Mustacchi, P., ana shimkin, i». v»»» f •» , ft9 -o A o 1Q58 
W MM fte««<oM««. J- Nat- Cancer Inst 2C >: JB5-*12. 1958. 

(11) LhjenfbIiD, A. M.: The relationship of cancer of the femal 
menopause and marita Utatus ^S^^^,! 

X- Kr^^r1£rw5 National Cancer Conference. Philadelphia, 

8 B a°nd Ka 2TS [ ' An epldeSologic'al sludy on cancer in Japan. Gann 8« PP . 

t>£JH\ J., Thomas, L. B., Edocomb, J. H., and Stewart, H. L.: Some 
eXnmental factors 'and the development of uterine cancers in l-aeland 
Newark City. To be published in Acta Unio internal, contra cancrum. 
US) <22Sr of the uterine cervix and social conditions. Brit. J. Cancer 

(W Wtndeb^E^' L^CobnfielDi J., Schbow, P. D and Dobaiswami K A 
study of environmental factors in carcinoma of the cervu. Am. J. Obst. & 

(«, ySjot ^rPoln, M. M, Tobacco smoking and cancer of the 
mouth and respiratory system. Cancer Res. 10: 539-542, 1950. 

(IS) Wtndeb, E. L„ Bboss, I. J., and Dat, E.: A study of environmental factors » 
cancer of the larynx. Cancer 9: 86-110, 1956. 

tiff) lS ?£ i Cabboll, B. E, Some epidemiological aspects of leukenua 

( in children. J Nat. Cancer Inst. 19: 1087-1094, 1957. 

(«,) Bbbsl^Z Hoaolin, L, Rasmussen, G., and Abbams, H. K: Occupations 
and cigarette smoking as factors in lung cancer. Am. J. Pub. Health 44. 

(«) Do^'SI'^iLL, A. B.: A study of the aetiology of carcinoma of the lung, 
between'smoking and carcinoma of the lung. 

(24) * 

rm wt££Vl f andGBAHAM,E.A.: Tobacco smoking as possible etiologio factor 
WW SteS£5£ carcinoma. J.A.M.A. 143: 329-336, 1950 
f«<n Wtndeb^ E L , Bboss, I. J., Cobnfield, J., and O'Donnell, W. E.: Lungcancer 
m inTomen New England J. Med. 255: 1111-1121, 1958. 
tan hSJSTw , SmMKiN, M. B. ( and Mantel, N.: A retrospective study of lung 
m canSr to women. J. Nat. Cancer Inst. 21: 825-842, 1958. 
m lS BbItall, H. H . end Robbbts, J. A. P, A relationship between cancer 

of stomach and the ABO blood groups. Brit. M. J. L 799-801, IBM. 
US\ Buckwaltbb, J. A., Woblwbnd, C. B., Coltbb, D. C, Tibbick, R. T., ana 
m SIwleb, L A.: The association of the ABO blood groups to gastric carci- 
noma. Sura. Gyneo. & Obst. 104: 176-179, 1957. 

(50) C 8 s7le4 M. L., and ^^/^^^C^ ~ 
sociatione with gastric cancer. Am. J. Pub. Health 47. 961-970, 1957 
Corn field J : A method of estimating comparative rates from clinical data. 
m A^S^ns to cancer of the lung, breast, and cervix. J. Nat. Cancer Inst, lis 

(50 Do^BL*: to applications of biometry in the collection and evaluation of 
medical data. J. Chron. Dis. 1: 638-664, 1955. 



Doll, k., ana rviiiU, a~ «• »»— * — 

Brit. M. J. 2: 1271-1286, 1952. 
Levin, M. L.: Etiology of lung cancer; present status. New York J. Med. 54. 

769-777,1954. and Cobnfield, J.: The statistical association 

„d Co»™, A. J.: L»»g »«<*. »d moktag. Am. J. Surg, 89: 



Vol. 82, No. 4, April X9S9 



748 MANTEL AND HAENSZEL 

(S3) Neyman, J.: Statistics — servants of all sciences. Science 122: 3166, 1955. 
(34) Behkson, J.: Limitations of the application of fourfold table analysis to hospital 

data. Biometrics Bull. 2: 47-53, 1946. 
(36) White, C: Sampling in medical research. Brit. M. J. 2: 1284-1288, 1953. 

(36) Greenwood, M., and Yule, G. IL: On the determination of size of family and of 

the distribution of characters in order of birth from samples taken through 
members of the sibships. Roy. Stat. Soc. J. 77 : 179-197, 1914. 

(37) Haenszel, W.: Variation in incidence of and mortality from stomach cancer with 

particular reference to the United States. J. Nat. Cancer Inst. 21: 213-262, 
1958. 

(38) Videbaek, A., and Mosbech, J.: The aetiology of gastric carcinoma elucidated 

by a study of 302 pedigrees. Acta med. scandinav. 149: 137-159, 1954. 

(39) Whelpton, P. K., and Prbedman, R.: A study of the growth of American fam- 

ilies. Am. J. Sociol. 61: 595-601, 1956. 

(40) Levin, M. L., Goldstein, H., and Gerbardt, P. R.: Cancer and tobacco smok- 

ing. J.A.M.A. 143: 336-338, 1950. 

(41) Levin, M. L., Krato, A. S., Goldberg, I. D., and Gerhardt, P. R.: Problems 

in the study of occupation and smoking in relation to lung cancer. Cancer 8: 
932-936, 1955. 

(4$) Lilibnfeld, A. M.: Possible existence of predisposing factors in the etiology of 
selected cancers of nonsexual sites in females. A preliminary inquiry. Cancer 
9: 111-122, 1956. 

(43) Winkelstein, W., Jr., Stenchever, M. A., and Lilienfeld, A. M.: Occurrence 

of pregnancy, abortion and artificial menopause among women with coronary 
artery disease: a preliminary study. J. Chron. Dis. 7: 273-286, 1958. 

(44) Billington, B. P.: Gastric cancer— relationships -between ABO blood-groups, 

site, and epidemiology. Lancet 2: 859-862, 1956. 
(4$) Schwartz, and Anguera, G.: Une cause de biais dans certaines enqueues 
m&iicales: le temps de sljour des malades a l'hdpital. Communication a 
rinstitut International de Statistique, 30eme Session. Stockholm, 1957. 

(46) Cornfield, J.: A statistical problem arising from retrospective studies. Proc. 

Third Berkeley Symposium on Mathematical Statistics and Probability 4: 
135-148, 1956. 

(47) Satterthwatte, F. E.: Synthesis of variance. Psychometrika 6: 309-316,1941. 

(48) Halpbrin, M.: The use of X* in testing effect of birth order. Ann. Eugenics 18: 

99-106, 1953. 



Original Paper 



Hum Hered 1992;42:337-346 



Joseph D. Tcmilliger 
JurgOtt 

Department of Genetics and 
Development, 
Columbia University, 
New York, N.Y., USA 



A Haplotype-Based 'Haplotype 
Relative Risk' Approach to 
Detecting Allelic Associations 



Keywords 

Linkage disequilibrium 
Haplotype relative risk 
Allelic association, 
. Chi-square tests 



Abstract 

A novel variation of the Haplotype Relative Risk (HRR) of 
Rubinstein et aL [Hum Immunol 198l;3:384] is proposed, in or- 
der to glean increased information about linkage disequilib- 
rium or allelic associations by analyzing haplotype -based data 
rather than genotypic data. It is shown that statistical tests 
based on our design give much higher power than those based 
on the original HRR approach. Several additional nonpara- 
metric tests based on the same data are analyzed, and power is 
computed for each of them. Further, parametric likelihood 
methods are applied to testing linkage equilibrium, and esti- 
mating 8, the coefficient of linkage disequilibrium, from the 
same data. 



Introduction 

Allelic associations between etiologically 
unrelated traits were originally detected in hu- 
mans through observations at the genotypic 
level. In the 1950s, it was noticed that in indi- 
viduals with certain diseases there were signif- 
icant excesses of certain blood groups. Aird et 
aL [1, 2] demonstrated the presence of a signif- 
icant association between blood group A and 
stomach cancer, and between blood group O 
and peptic ulcer, while Pike and Dickens [3] 
found such an association between blood 



group O and toxemia of pregnancy, and 
McConnell et al. [4] studied associations be- 
tween blood groups and carcinoma of the 
lung. Woolf [5] then proposed his Relative 
Risk statistic to compare the incidence rates in 
given blood groups in a case control type of 
study, in which one would collect a sample of 
people with the disease and compare the ob- 
served frequency of the 'risk allele' with its fre- 
quency in a separate sample of healthy indi- 
viduals (or population frequency, if known). 

One problem with this method is that there 
is no way of knowing whether a significant re- 



Joseph D. Terwilliger 
722 Wesl 168th St. Box 58 
New York,NYlU032(USA) 



suit is biologically meaningful or just a conse- 
quence of having the case and control, samples 
taken from different genetic populations in 
which the frequency of the risk allele is differ- 
ent and therefore, no real association exists. 
To attempt to circumvent this problem, Ru- 
binstein et al. [6] proposed the Hapiotype Rel- 
ative Risk (HRR) statistic, based on earlier 
work of C.A.B. Smith, to ensure that the con- 
trol and disease samples were well-matched, 
from the same population, so that any ob- 
served association would have to be due to a 
real allelic association of some sort. This ex- 
perimental design has also been used in the 
hapiotype frequency difference statistic of 
Seuchter et al. [7]. 

Experimental Design 

H « Marker allele with which disequilibrium is hy- 
pothesized. 

H m Any allele other than H at the marker locus. 
5 = Gametic linkage desiquilibrium coefficient; 
= P(AB gamete) - P(A)P(B) (A at one locus B 
at the other). 
0 = Recombination fraction between marker and 

disease loci, 
p = Gene frequency of the disease allele, 
q = Gene frequency of the H allele, 
n » Sample size. 

In order to be sure one has matched control and 
disease samples, Rubinstein et al. [6] proposed using 
data from nuclear families with one affected offspring 
lo test for deviations from linkage equilibrium. They 
recommended using the affected offspring's genotype 
(made up of alleles transmitted from parents to the af- 
fected child) at a marker locus as the 'case* sample, and 
an artificial genotype made up of the alleles not trans- 
mitted to the child from its parents as the 'control' sam- 
ple in an association test. Then they used such data to 
test whether the H allele was present equally fre- 
quently in diseased individuals' genotypes, and the 
nontransmitted control genotypes. For example, in a 
family with unaffected parents with genotypes G/H 
and I/J at the marker locus, and an affected child with 
marker genotype H/I, the transmitted genotype would 
be H/I, and the artificial nontransmitted genotype 
would be G/J. Since they were only interested in 



Table 1. Data collected in a hapiotype relative risk 
study (either HHRR, or GHRR) 



Transmitted 


Not transmitted 


Total 




H 


H 




H 


A 


B 


W 


H 


C 


D 


X 


Total 


Y 


Z 


N 



In the 2x2 table shown here, each cell corre- 
sponds to one parent. In the HHRR, each parent 
transmits one allele, and not the other, and can thus 
be classified by which allele was, and which was not 
transmitted to the affected offspring. In the GHRR, 
each set of parents has 4 alleles, 2 of which are trans- 
mitted to the affected child, and 2 which are not. If 
the child contains 1 or 2 H alleles, we say H was trans- 
mitted, and if there is an H allele in the remaining 2 
alleles, we say that H was nortransmitted. Thus, each 
family cither transmits H or H, and has either H or H 
among the nontransmitted alleles, and can therefore 
also be characterized by one cell of this table. 



Table 2. Hapiotype relative risk 





H 


H 


Total 


Transmitted 


W 


X . 


N 


Not transmitted 


Y 


H 


N 


Total 


W + Y 


X + Z 


2N 



The data in this table are taken directly from the 
marginals of table 1, and represent the form of the 
originally proposed GHRR statistic. This table, of 
course, can be filled with either hapiotype- or geno- 
type-based data. All variable names are the same as 
in table 1. 



whether H was present or absent from the genotypes, 
in this example we have H transmitted, and H not 
transmitted (genotype G/J does not contain H). For ev- 
ery such nuclear family there would be one such obser- 
vation. One can then tabulate such observations in the 
form of table 1. The example family above would fall in 
cell B. Ott [8] demonstrated that under the null hy- 
pothesis of 6 = 0, the transmitted and nontransmitted 



33H 



Terwilliger/Ott 



Haplotype-Based HRR 



"a hapiotype relative risk 
RR) 



ansmitted 


Total 


H 




B 


W 


D 


X 


Z 


N 



l here, each cell corre- 
ic HHRR, each parent 

the other, and can thus 
was, and which was not 
>ffspring. In the GHRR, 
les, 2 of which are trans- 

and 2 which are not. If 
eles, we say H was trans- 
allele in the remaining 2 
^transmitted. Thus, each 
-f, and has either H or H 
llelcs, and can therefore 

cell of this table. 



/e Tisk 


H 


Total 


X 


N 


H 


N. 


X + Z 


2N 



taken directly from the 
present the form of the 

statistic. This table, of 
her hapiotype- or geno- 

names are the same as 



»ent from the genotypes, 
transmitted, and H not 
;s not contain H). For ev- 
vould be one such obser- 
such observations in the 
amily above would fall in 
that under the null hy- 
tted and nontransmitted 



alleles are independently associated, and thus we can 
treat our transmitted and nontransmitted samples in- 
dependently and represent them in the form of table 2 
(marginals of table 1). Then a standard x 1 test of inde- 
pendence on this table can be shown to be a valid % 2 test 
of the hypothesis 8 = 0. This is the test proposed by Ru-, 
binstein et al. [6] to guarantee the control and disease 
samples are genetically well-matched. 

As is shown below, the statistical method of Rubin- 
stein et al. [6] does not take advantage of all the in- 
formation present in the_data. Their method lumps 
H/H homozygotes and H/H heterozygotes together as 
H genotypes. However, since under the null hypothesis 
the two parental genotypes are independent, it is pos- 
sible to treat each parent as an independent observa- 
tion, and merely look at the fate of each parental, 
marker allele. So, in the example family above, there 
would be one observation of H transmitted, G not 
transmitted, and one observation of I transmitted, J 
not transmitted, which in table 1 (now referring to al- 
leles, not genotypes), would contribute one observa- 
tion to cell B, and one observation to cell D. Again, for 
theoretical reasons given by Ott [8], transmitted and 
nontransmitted alleles are independent for each other, 
and can be collapsed, as in the Rubinstein case, into ta- 
ble 2, in which the example family would contribute 
one observation to cell W, one to cell X, and two to cell 
Z, the marginal values of table 1. We are thus using 
more of the information present in the family, obtain- 
ing twice as many observations from the same amount 
of data. 

Recessive Disease 

Haplolype-Based versus Genotype-Based 
HRR% 2 Tests 

We first compared the power of our hapio- 
type-based HRR (HHRR) statistic with the 
genotype-based HRR (GHRR) of Rubinstein 
et al. [6]. The test we applied to each data set is 
essentially a y} test of independence on table 2 
for the haplotype-based data (HHRR test), 
and for the equivalent genotype-based table 
(GHRR test) in which discrimination is be- 
tween genotypes with no H allele, and those 
with at least one (possibly two). Power calcula- 
tions were performed for each test, assuming a 
recessive disease with no phenocopies (pene- 
trance is irrelevant to the calculations, accord- 



ing to Ott [8]), by analytically computing the 
probability of a significant y} test result (% 2 
> 3.84 at the 0.05 level) for different combina- 
tions of 5/p (5 and p are completely con- 
founded according to Ott [8]), q, and 0. Power 
curves for these two tests (n = 100 families, q = 
0.5) are given in figure 1 for varying true values 
of ,0 and 8/p. In all the numerical cases we 
considered, the HHRR test was more power- 
ful than the GHRR approach of Rubinstein et 
al. [6]. This is intuitively satisfying, since the 
HHRR approach discriminates between H/H 
homozygotes and H/H heterozygotes, while 
the GHRR does not. Thus, our approach uses 
all of the information in the data, where the 
traditional GHRR does not. 

The test of independence on table 2 is a test 
of E[W] = E[Y]. However, W and Y are ob- 
tained from the marginals of table 1. So, when 
we are testing E[W] = E[Y], we are essentially 
testing E[ A + B] = E[A + C], which is the same 
as E[B] « E[C]. Clearly this is expected under 
the null hypothesis of no disequilibrium. Us- 
ing the data from table 1, the HHRR y} is com- 
puted as 

~2N(B-C) 2 

. (2A + B + C)(N-2A-B-C) 

2N(WZ-XY) 2 
= (W + X)(W + Y)(X + Z)( Y + Z) ' 

the standard y} test of independence on a 2 x 2 
table. This is a valid y} test, of the form (B-C) 2 / 
Var[B-C], since Var[B-C] = 2Nq(l-q), which 
is estimated by 2N[(2A+B + C)/(2N)][1-(2A 
+ B + C)/(2N)]. The power is shown graphi- 
cally in figure 2 for n = 50 families (for com- 
parison with other haplotype-based tests be- 
low). 

McNemar Tests 

Since our null hypothesis is B = C in a 
paired sampling (transmitted allele, nontrans- 
mitted allele) test, one's first intuition might 



3ased HRR 



339 



Power 



Fig. 1. Power curves (analyti- 
cally computed) for x 2 tests based 
on the haplotypc- ( — ) and geno- 
type-based ( ) HRR designs 

(100 families), for q - 0.5. If p - 0.5, 
then all values of 6/p shown are 
possible. For other values of p, dif- 
ferent restrictions apply, but have 
no effect on the power curve. The 
upper two lines are for the power of 
the test when 0 = 0, and the lower 
set of two lines correspond to 0 « 
0.20. Note that the haplotype-based 
design yields higher power for all 
true values of 0 and 8/p. 




Delta/p 



Pow«r 



Fig. 2. Power curves (analyti- 
cally computed) for the HHRR test 
(50 families) for q « 0.5, with 0 = 0 
(upper curve), 0.2 (middle curve) 
and 0.5 (lower curve). 




0 1 0 2 0 3 OA 0,5 



Delta/p 



be to apply a McNemar test, (B-C) 2 /(B + C). 
In order for this to also be a valid y} test, 
(B + C) would have to be an estimate of the 
variance of (B-C), which we already have 
shown to be 2Nq(l-q). Our HHRR y} test uses 
all the data to estimate q, including the infor- 
mation from homozygous individuals, while in 
the McNemar test, all homozygotes are ig- 
nored, and the variance is estimated as 
(B + C). Clearly E[C] - E[B] « Nq(l-q) under 
the null hypothesis (5 = 0), so (B + C) then es- 



timates 2Nq(l-q). However, in every numer- 
ical case we considered, this test was less pow- 
erful than the HHRR test, as shown in figure 
3, due to the fact that the HHRR uses all of 
the data to estimate the variance, while the 
McNemar uses only the information from het- 
erozygous parents. 

Independence Tests 

An interesting result of Ott [8] is that trans- 
mitted and nontransmitted alleles are inde- 



340 



Terwilliger/Ott 



Haplorype-Based HRR 



rower 



3.8 0;3 0.4 0.5 



Fig. 3. Power curves (analyti- 
cally computed) for the haplotype- 
based McNemar (HMCN) test (50 
families) for q = 0.5, with 0 ~ 0 
(upper curve), 0.2 (middle curve), 
and 0.5 (lower curve). 




r 0.5 -0.4 



-0;3 ■"■".\f0,2- 



Delta/p 




0:Z 03 : ::..;P:4. ;0,5 



Fig. 4. Power curves (analyti- 
cally computed) for the haplotype- 
based independence lest (HIND) 
for 50 families, q = 0.5, and 0 - 0 
(lower curve), 0.2 (middle curve), 
and 0.5 (upper curve). 




: r-0.5 t0.4 



^0 3 -0 2 , r-Q.l ,; 0 0.1 

Dfelta/p 



0.2 



0.3; 



0:4 



'er, in every numer- 
lis test was less pow- 
t, as shown in figure 
1 HHRR uses all of 
variance, while the 
formation from het- 



:Ott[8] isthattrans- 
sd alleles are inde- 



pendent when 8 - 0 or when © = 0. In light of 
this, one could use an independence test on ta- 
ble 1 as a test of 5 = 0, though clearly when 0 is 
close to 0, this test should not be useful. This 
test is just that (AD-BC) = 0. Therefore, 
the test should be (AI>-BC) 2 /Var( AD-BC), 
which is the standard x 2 test of independence 
on a 2 x 2 table, N(AD-BC) 2 /(WXYZ). Power 
was analytically computed for this test, under 
the recessive model, for various true values of 
q, 5/p, and 0, which are graphically presented 



in figure 4. In this test, the power increases as 
0 increases, just the opposite behavior from 
the HHRR and McNemar tests. This test may 
thus be a useful way to use such nuclear family 
data to test 5 = 0 when 0 is known to be quite 
large, since when 0 « 0.5, the HHRR tends to 
0[8]. 

This independence test, however, fails to 
impose the restriction that the frequency of 
the H allele be equal in both the transmitted 
and nontransmitted samples. To include this 



ased HRR 



341 



Power 



Fig. 5. Power curves (analyti- 
cally computed) for the test of fit to 
the expected multinominal propor- 
tions (HIID) of haplotype-based 
data for 50 families, q = 0.5, and 
0 « 0 (upper curve), 0.2 (middle 
curve), and 0.5 (lower curve). 




-Q.4 



-0 3 -0 2 ~6 I 0 . *U 



information, one could test the fit of the 
counts of A, B, C, and D to their expected mul- 
tinominal proportions (each observation is 
clearly independent) as follows: E(0-E) 2 /E, 
which is equal to 

(A-Ng; 2 ) 2 [B-Nq(l-q)] 2 [C-Nq(H)] 2 
Nq 2 Nq(l-q) + Nq(l-q) 

[D-N(H) 2 ] 2 2A + B + C 

+ , where q = . 

N(I-4) 2 2N 

This test follows a % 2 distribution with 2 df, 
since we had 4 cell counts, but fixed the sum 
A + B + C + D = N, and estimated q from the 
data. This test is very powerful over a large 
range of values of S/p, q, and 0, as shown in 
figure 5, and thus provides a useful general 
test for disequilibrium. 

Relative Power of Nonparametric 
Approaches 

Each of the tests described above has dif- 
ferent properties which make it useful. How- 
ever, the question remains as to which test 
should be used in a given situation. To answer 
that question, for each combination of 0, 5/p, 




Fig. 6. Graph showing, for all possible values of © 
and 8/p, and fixed q = 0.5, which among three tests is 
the most powerful (50 families). The values of the 
power are not shown, but are given in fig. 2-5 (HMCN 
is never the most powerful). 



and q, we determined which test gave maximal 
power for a sample size of 50 families. The re- 
sults are presented graphically in figure 6. In 
this figure, for fixed q, we considered all pos- 
sible combinations of 8/p and 0, and deter- 
mined which test gave maximal power (analyt- 
ically computed). Then for each point (8/p, 0) 
the most powerful test is indicated. To see ex- 



342 



Tcrwilliger/Ott 



Haplotype-Based HRR 



actly what the power was, the reader is re- 
ferred to the power curves already presented 
for each test. Some interesting patterns can be 
seen in this figure, but it should be used only in 
conjunction with the actual values of the. 
power shown in figures 2-5, for often the dif- 
ference is small between tests. However, over 
the most relevant ranges of 8/p and 0, for all q, 
the HHRR test is the most powerful. In light 
of this, and the relative implausibility of strong 
disequilibrium when 0 is large, the HHRR 
test should be the general nonparametric test 
of choice, both for its power, and its simplicity. 

Parametric Likelihood Ratio Tests 
If one knows the model of the disease, one 
could do a parametric likelihood ratio test 
analysis, based on theoretical probabilities of 
each type of parent under a fixed model. Table 
2 of Ott [8] provides such parametric values 
for the case of a recessive disease. The diffi- 
culty here is three fold. First, one needs to 
have an accurate parametric model for the 
disease, and compute the parametric proba- 
bilities of each cell of table L This process is 
very tedious (except for the recessive model 
described by Ott {8]), and depends heavily on 
the disease model. Secondly, one needs to 
maximize the likelihood of the data over all 
the parameters, 0, (S/p), and q, and then again 
maximize the likelihood, fixing 8 = 0. This 
would give us the following likelihood ratio: 
L(S/p, q)/L(8/p = 0, 0, q). Normally, one 
can treat 2x In(LR) as a y} random variable, 
with the number of degrees of freedom being 
the difference in free parameters in numer- 
ator and denominator of the likelihood ratio, 
which would appear to be 1 in this case. How- 
ever, when 8 = 0, 0 disappears as a parameter, 
as shown by Ott [8]. When a parameter dis- 
appears under the null hypothesis, it is a de- 
generate situation, and so the statistic does 
not satisfy the criteria for x 2 . As the distribu- 
tion is unclear, this test becomes very awkward 



to interpret, and presents a situation analo- 
gous to the degenerate likelihood ratio test for 
linkage in the presence of heterogeneity [9]. 
For this reason, combined with the enormous 
computer time involved, power was not calcu- 
lated for this approach. 

For general pedigree data (including nu- 
clear families with multiple offspring), with a 
fixed-disease model, parametric likelihood ra- 
tio tests are tractable using any linkage analy- 
sis program, like ILINK of the LINKAGE 
package. One need only maximize the likeli- 
hood over 0, q, p, and 8 for the numerator, 
and again maximize the likelihood for the de- 
nominator over 0, q, and p, fixing 8 = 0. This 
would then be a valid, and powerful general li- 
kelihood ratio test of 8 = 0, 2 x ln[L(£>, 8, p, q)/ 
L(£>, 8 = 0, p, q)J. It is important to remember 
that when using this method, the maximum li- 
kelihood estimates of the haplotype frequen- 
cies will reflect the sample frequency of the 
disease allele, which is not an accurate reflec- 
tion of its population frequency. One must be 
sure to weight disease and control haplotypes. 
accordingly. For example, if our haplotype fre- 
quency estimates are P(Hd), P(Hd), P(HD), 
P(HD), and we know the true gene frequency 
of the d allele, p d , we can compute adjusted 
haplotype frequency estimates as 

/ f»(Hd) \ 

and so on. Similarly, if one wanted to estimate 
the coefficient of disequilibrium from such 
ILINK estimates, it would be necessary to use 
the adjusted estimates described above, yield- 
ing an adjusted estimate of 

PoO-Pd) 

where 8 = P(Hd)_f(HD) - P(HD) P(Hd), and 
p d = P(Hd) + P(Hd). 



343 



An ad hoc method sometimes used in gen- 
eral pedigrees is to assume the absence of re- 
combination, and determine the haplotypes of 
each founder, between marker and disease, as 
a way to insure the control (nondisease) ha- 
plotype are from the same genetic population 
as the disease haplotypes. This ad hoc ap- 
proach has been applied, for example, in cystic 
fibrosis [10]. It assumes an absence of recom- 
bination, and its statistical properties are, in 
general, unclear, especially in cases where 0 is 
actually greater than zero. Another problem is 
that it is not always possible to uniquely and 
accurately determine ail founder haplotypes. 
Censoring such indiscernible cases in some in- 
stances can be shown to lead to a statistical 
bias. In light of all of this, if one wants to use 
general pedigree data to test and quantify dis- 
equilibrium, the likelihood ratio test with 
ILINK described above is the test of choice, as 
it is more general and powerful, and has well- 
characterized statistical properties. 

Nonrecessive Case 

All of our results above were obtained for 
the case of a recessive disease. However, when 
other more complicated models prevail, the 
situation becomes unclear. While under any 
model we choose for the disease, the above 
tests are valid tests of 8 = 0 (since this implies 
no association between the disease and the 
marker locus), the effect on the power of our 
testing procedures is not so clear. When deal- 
ing with a recessive disease, a lot of additional 
information about linkage disequilibrium is 
obtained by looking at each parent separately, 
since each parent transmits a disease allele to 
the affected offspring, but the situation is less 
clear when there is a different model. For a 
dominant disease, with one affected parent, 
and one affected child, one can just consider 
the affected parent, and his or her transmitted 



and nontransmitted alleles, and base a test on 
the same procedure as above. The effect 
would be that there would be only one obser- 
vation per family instead of two in the reces- 
sive case (where we know the parents to be 
heterozygous for the disease), and there is 
possible noise when the unaffected parent ac- 
tually transmits the disease to the offspring, 
though this should be very rare. 

In the case of dominant reduced-pene- 
trance disease, in which neither parent is af- 
fected, clearly at least one parent must carry 
the disease-predisposing allele, though we 
cannot discern which one. In this situation, 
one parent will transmit the disease allele (in 
putative disequilibrium with the marker), and 
the other parent will transmit the normal al- 
lele. This adds noise to our system. One would 
expect the Rubinstein method to be less sensi- 
tive to this noise, since it doesn't distinguish 
between heterozygotes and homozygotes for 
the H allele. 

Power calculations were approximated for 
this situation by simulation. A simplified 
model was considered in which one paren t was 
forced to transmit the disease allele to ,the af- 
fected child, while the other parent was as- 
sumed to be homozygous unaffected (a rea- 
sonable assumption for small p). In this case, 6 
and p are no longer completely confounded, 
so we had to treat p, q, 8, and 0 as separate pa- 
rameters. Then, 20,000 sets of 100 such nuclear 
families with 2 unaffected parents and one af- 
fected offspring were simulated under various 
assumptions on p, q, 8, and 0. For each set of 
100 families, the HHRR and GHRR were cal- 
culated. Then the number of significant re- 
sults for each test at the 0.05 level (x 2 ^3.84) 
was counted to estimate the power of each 
test, which is graphed in figure 7. An interest- 
ing situation arises here, where the HHRR is 
much more powerful for negative values of 8, 
but for positive values of 8 they are just about 
equal in power, with the GHRR being slightly 



344 



TerwMiger/Ott 



Haplotype-Bascd HRR 



"is, and base a test on 
above. The effect 
d be only one obser- 
of two in the reces- 
w the parents to be 
>ease), and there is 
naffected parent ac- 
ise to the offspring, 
y rare. 

nant reduced-pene- 
neither parent is af- 
e parent must carry 
allele, though we 
e. In this situation, 
:he disease allele (in 
ith the marker), and 
ismit the normal al- 
r system. One would 
thod to be less sensi- 
: doesn't distinguish 
nd homozygotes for 

re approximated for 
ition. A simplified 
/hich one parent was 
ease allele to the af- 
ther parent was as- 
5 unaffected (a rea- 
lallp). In this case, 5 
pletely confounded, 
ndQas separate pa- 
s of 100 such nuclear 
parents and one af- 
ilated under various 
d 0. For each set of 
nd GHRR were cal- 
ir of significant re- 
).05 level (x 2 ^3.84) 
the power of each 
igure 7. An interest- 
vhere the HHRR is 
legative values of 8, 
S they are just about 
JHRR being slightly 



Fig. 7. Power curves (simu- 
lated) for the HHRR ( — ) and 
GHRR ( ) tests with a dom- 
inant disease (reduced penetrance) 
and two unaffected parents, forq = 
0.5, p = 0.01, and 100 families, based 
pn 20,000 replicates. The upper 
curves represent 0 = 0, and the 
lower curves 0 « 0.2. In most cases, 
the HHRR is shown to be much 
more powerful than the GHRR. 



more powerful for very extreme values of 8. 
The HHRR test is also more powerful than 
the other haplotype-based nonparametric 
tests over most of the reasonable sample 
space. The HHRR is more powerful than the 
GHRR in all recessive situations, dominant 
situations with 6 < 0, and about equally power- 
ful with the GHRR in dominant situations 
with extremely positive 8. Further, the HHRR 
can take advantage of dominant situations 
with one affected parent, while the GHRR 
cannot. Therefore, we recommend using the 
HHRR as the nonparametric test of choice in 
general 



Discussion 

When doing an association study, it is often 
difficult to find genetically well-matched cases 
and control samples. The HRR approach of 
using transmitted and nontransmitted alleles 
from the same parent as case and control sam- 
ples ensures that they are genetically well- 
matched [11]. Further, the case and control 
samples are shown to be independent under 
the null hypothesis of 8 = 0. In light of this, 




03: 0 4 0 5 



HRR-type methods should be increasingly 
more important as geneticists try to map com- 
plex diseases, by looking for associations with 
candidate genes for example. In such a case, if 
the candidate gene is correct, 0 would be 
equal to 0, and these methods would achieve 
maximal power to detect the associations. 
Further, the built-in genetic control should 
provide a solution to the often difficult task of 
finding a valid control sample, and should al- 
low people to have more faith in the validity of 
such association studies. 

The approach presented here extracts fur- 
ther information about disequilibrium from 
the data used in the original GHRR approach, 
and thus presents a more powerful way to de- 
tect such associations in the absence of a para- 
metric model. Given a parametric model, two 
likelihood-based methods were discussed as 
well. However, from the results of our power 
calculations, our HHRR seems to be the best 
general nonparametric test considered for de- 
tecting such associations with this experimen- 
tal design over the most biologically plausible 
ranges of 8 and 0. 



used HRR 



345 



Acknowledgements 

This work was supported by the National Institute 
of Mental Health (grant MH44292), the MacArthur 
Foundation Network I (Task Force on the Develop- 



ment of Analytic Strategies for Linkage Studies of Psy- 
chiatric Disorders), and the W.M. Keck Foundation. 
Further, the authors would like to acknowledge the 
contributions of Dr. Warren Ewens, whose comments 
are greatly appreciated. 



References 

. 1 Aird I, Bentall HH, Roberts JAF: 
The relationship between cancer of 
the stomach and the ABO blood 
groups. BMJ 1953;1:799-801. 

2 Aird I, Bentall HH, Mehigan JA, 
Roberts JAF: The blood groups in 
relation to peptic ulceration and 
carcinoma of colon, rectum, breast, 
and bronchus. BMJ 1954;2:315-321,. 

3 Pike LA Dickens AM: ABO blood 
groups and toxaemia of pregnancy. 
BMJ 1954;2:321-323. 

4 McConnell RB, Clarke CA, Down- 
ton F; Blood groups in carcinoma of 
the lung. BMJ 1954;2:323-325. 

5 Woolf B: On estimating the relation 
between blood and disease. Ann 
Hum Genei 1955;19:251-253. 



6 Rubinstein P, Walker M, Carpenter 
C, Carrier C, Krassner J, Falk CT, 
Ginsburg F: Genetics of HLA dis- 
ease associations. The use of the 
Haplotype Relative Risk (HRR) 
and the 'Haplo-Delta' (Dh) esti- 
mates in juvenile diabetes from 
three racial groups. Hum Immunol 
1981;3:384. 

7 Seuchter SA, Knapp M, Baur MP: 
Analysis of association in nuclear 
families; in Lynch HT, Tautu P 
(eds): Recent Progress in the Ge- 
netic Epidemiology of Cancer. Ber- 
lin, Springer, 1990, pp 89-94. 

8 Ott J: Statistical properties of the 
Haplotype Relative Risk. Genet 
Epidemiol 1989;6:127-130. 



9 On J: Analysis of Human Genet it- 
Linkage. Baltimore, Johns Hopkins 
University Press, 1991. 

10 Estivill X, Farrall M, Williamson R. 
Ferrari M, Seia M, Giunta AM, No- 
velli G, Potenza L, Dallapicolla B, 
Borgo G, Gasparini P, Pignatti PF, 
De Benedetti L. Vitale E, Devoto 
M, Romeo G: Linkage disequilib- 
rium between cystic fibrosis and 
linked DNA polymorphisms in Ital- 
ian families: A collaborative study. 
Am J Hum Genet 1988;43:23-28. 

11 Falk CT, Rubinstein P: Haplotype 
Relative Risks: An easy way to con- 
struct a control sample for risk cal- 
culations. Ann Hum Genet 1987;51 : 
227-233. 



346 



Terwilliger/Ott 



Haplotype-Based HRR 



Ann. Hum. Genet. (1987), 51, 227-233 

j 

Printed in Great Britain 



227 



Haplotype relative risks: an easy reliable way to construct a proper 
control sample for risk calculations 

C. T FALK and P. RUBINSTEIN 

The Lindsley F. Kimball Research Institute of The New York Blood Center, 310 E. 67th SL, 

New York, NY 10021 

SUMMARY 

An alternative to Woolf 's (1955) relative risk (RR) statistic is proposed for use in calculating 
the risk of disease in the presence of particular antigens or phenotypes. This alternative uses, 
as the control sample, the parental antigens or haplotypes not present in the affected child. The 
formulation of a haplotype relative risk (HRR) thus eliminates the problems of sampling from 
the same homogeneous population to form both the disease sample and an appropriate control. 

We show that, in families selected through a single affected individual, where transmission 
of the four parental haplotypes can be followed unambiguously, the mathematical expectation 
of the HRR is identical to that of the RR. Since the sample formed from the 'non-affected ' 
parental haplotypes is clearly from the same population as the disease sample, the HRR thus 
provides a reliable alternative to the RR. A further advantage obtains when family data are 
being collected as part of a study since the control sample is then automatically contained in 
the family material. 

Data from studies of patients with insulin dependent diabetes mellitus (IDDM) are used to 
obtain an estimate of the risk to those with HLA antigens or phenotypes associated with IDDM 
using the HRR statistic. A comparison of the HRR's and RR's for these data is also presented. 

INTRODUCTION 

Relative risks have been used for some time to estimate the increased risk of contracting a 
disease, given that a certain condition (or trait) is present, over that of the group lacking the 
condition. This formal definition of a relative risk requires prospective information that is not 
easily obtained and the relative risk is often approximated by the more easily obtained cross 

pr ° duCt Pr(Qlaff)Pr(g|control) 

Pr(g|aff)Pr(0| control)' 

where Q stands for the presence of the condition or trait and q for the lack of the condition, 
and the four terms are conditional probabilities as indicated. When the overall frequency of 
the disease in a population is low, this estimate will closely approximate the true relative risk. 
This odds ratio was proposed by Woolf (1955) to estimate the risk of contracting either peptic 
ulcers or stomach cancer for individuals of particular ABO phenotypes. Since then it has been 
used to calculate risks for genetic markers associated with many diseases and its most notable 
use has been in studying several HLA-associated diseases such as insulin dependent diabetes 
mellitus (IDDM), coeliac disease, multiple sclerosis and ankylosing spondylitis. Several assump- 
tions are generally made about the underlying population from which both the disease sample 



228 C. T. Falk and P. Rubinstein 

and the control sample arc obtained, most importantly that both samples are drawn from the 
same genetically homogeneous population in an unbiased way. By this we mean that the disease 
sample should be selected on a clear-cut ascertainment criterion, e.g. randomly chosen affected 
individuals with no bias pertaining to other factors, and the control sample should be a strictly 
random sample from the same genetic population. In practice, this latter criterion is rather 
difficult to fulfil and most often the control is created from conveniently available data drawn 
from a population thought to be somewhat closely related to that from which the disease sample 
was drawn. 

Several years ago we proposed (Rubinstein et al. 1981) an alternative method for obtaining 
the control sample for relative risk (RR) estimations that eliminated the problems of sampling 
from a single homogeneous population. This method used, as a control, those parental 
haplotypes not present in the affected child and was therefore called the haplotype relative risk 
(HRR). This method has several appealing features including freedom from collection of proper 
control samples. Additionally, where families are to be studied anyway, collection of the family 
data automatically includes collection of the necessary control sample. It is, however, necessary 
to demonstrate that the HRR estimate has the appropriate characteristics. In this paper we 
will show t hat . assuming the ; ideal ' conditions inherent in the definition of RR, namely, control 
and disease samples both randomly chosen from the same homogeneous random mating 
population, the expected value of the HRR is identical to that of the conventional RR. We 
will then illustrate its use in the estimation of risks for HLA antigens and phenotypes associated 
with ID DM. 

THE MODEL 

Consider a set of families that has been ascertained through a single affected child, where the 
relevant disease locus is closely linked to a normal polymorphic genetic marker (e.g. HLA) and 
where certain alleles (antigens) are associated with the disease. For purposes of concreteness, 
we will assume that the disease is recessively inherited, although the same arguments hold for 
dominance and for other inheritance models as well. Assume that the HLA haplotypes present 
in ihr parents can be followed unambiguously in transmission to the offspring and designate 
the two inherited by the affected child as V (paternal) and V (maternal). Thus haplotypes 
a and r are assumed to carry the disease allele, say 'n\ In the special case where the child 
as well as both parents are ac, it is not certain whether the child gets the a from the mother 
or the father. However, it is still known that one a and one c haplotype were transmitted to 
the affected child, and thus carry the n allele, and that the haplotypes not passed on to the 
affected child were also a and c. The latter can therefore be included in the 4 random sample' 
as described below. Now if we have truly obtained our sample as a random, singly selected 
sample, the two parental haplotypes not transmitted to the affected child (say b and d) will 
represent a random sample of haplotypes from the population at large and will thus carry the 
disease allele (/<) or the normal allele (A 7 ) with probabilities equal to the allele frequencies in 
the population (say p x and p 2 , respectively, p 1 +p 2 = 1 ). The validity of this observation requires 
compliance with certain other assumptions including (1) that the parents are not inbred, (2) 
that there is no correlation within or between parental phenotypes and (3) that there is no 
differential fertility of the disease phenotypes. 

Xow assume that an antigen <? ? at the HLA locus is in positive linkage disequilibrium with 
n. the disease allele. We wish to calculate the relative risk to carriers of Q of contracting the 



Haplotype relative risks 229 

disease. We will use as our control population the set of and *'d' haplotypes from our sample 
of disease families (that is, those haplotypes within a family not carried by the single affected 
proband). Using this control we will then calculate the conventional cross product odds ratio 
given above to obtain the haplotype relative risk (HRR). Define the relevant population 
frequencies as follows : 

AQ) - 9v 

/(<?) = Qt — 1 ~9i (where q represents all other alleles), 
f{n)-p v 
f(N) = Pt =l- Pl , 
f(Qn) = x 1 =p 1 q 1 +8, 
f(QN) = x t = p tqi -8, 
f(qn) = x 3 =p 1 q 2 -8, 
f(qN) = x i =p 2 q i +8, 

where 8 is the measure, of disequilibrium between n and Q. 

We now need the four conditional probabilities necessary for the odds ratio. For the affected 
sample these are the same, regardless of how we choose our control. 

p\-A 
~ Pi ' 

Pr(„otg|aff) = ^ 5 = | 

Now since the control haplotypes will be a random sample from the population, the conditional 
probabilities will be: p r((?|control) = j.^,, = 

Pr(not Q | control) = (x z + z 4 ) 2 = q\. 
Thus the estimate of the HRR is : 

Pr(Q)aff) Pr(not Qlcontrol) 
Pr (not Q | aff) Pr(<2 1 control) 

which is identical to the equivalent expression for the conventional RR. 



EXAMPLE 

Using data collected for the 9th HLA Workshop (Bertrams & Baur, 1984) we looked at the 
sample of families, submitted for study, where a single child was affected with IDDM and where 
the ethnic background was caucasoid (Western European or North American). The patients 



230 



C. T. Falk and P. Rubinstein 



Table 1. DR phenotypes of IDDM disease sample, simplex cases 



DR type 


No. obs. 


No. exp. 


DR3, 3 


6 


7*8 


DR3, 4 


25 


182 


DR4, 4 


4 


107 


DR3. X 


16 


191 


DR4, X 


29 


22'4 


DRX. X 


10 


117 


Total 


90 


8 9 ;9 



p{DF3) = 0 294: p(DR4) = 0344 : p(DRX) = 0361 ; a/fi = (0-278)/(fr202) = 1 38. 
Table 2. DR phenotypes of control sample consisting of non-affected parental haplotypes 



DR type 


No. obs. 


No. exp. 


DR3.3 


o 


077 


DR3, 4 


2 


123 


DR4. 4 


O 


0'49 


I)R3. X 


»3 


1226 


DR4, X 


IO 


976 


DRX. X 


48 


4849 


Total 


73 


73-00 



p(DR3) = 0103: p(DR4) i = 0082; P(DRX) = 0 815; * 2 = 1*79, 2. d.f. 

were categorized, with respect to their HLA DR phenotypes using three distinct allelic groups 
DR3. DR4. and DRX. where DRX represents all other DR antigens except DR3 and DR4. The 
results are shown in Table 1 with estimated allele frequencies and observed and 'Hardy- 
Weinberg expected ' numbers for each phenotypic class. The a/ ft ratio of Falk et ai, (1983) was 
also calculated and found to be 138. This ratio relates the observed frequency (a) of, say the 
DR3,4 phenotype, to the Hardy- Weinberg expected frequency \fi - 2p(DR3)p(DR4)] in a 
sample of diseased individuals (Table 1). A value in excess of 1-0 is an indication that the 
associated suscept ibility locus does not show a simple dominant or recessive mode of inheritance 
with a single susceptibility allele. The value of 1*38 found here is characteristic of samples of 
1 DDM individuals where an excess of DR3, 4's is often observed thus suggesting a more complex 
mode of inheritance for susceptibility (Falk, 1984). The 'control group' was made up of the 
parental haplotype pairs not present in the affected child (only families in which all four HLA 
haplotypes could be followed were used). There were 146 parental control haplotypes. The allele 
frequencies for DR3, DR4, and DRX in this group were 0103, 0082, and 0-815 respectively. 
These values agree remarkably well wit h the total frequencies obtained for the * random mating 
population ' comprising all caucasoid random individuals submitted to the 9th HLA Workshop 
(Raur et al 1984) (see. e.g. the table on page 694, where the DR marginal frequencies are 0-122, 
0-129. and 0 749 for the same three DR alleles). If the control haplotypes from each family 
are assumed to be a control individual', we obtain a control population sample of 73 which 
is in H-W equilibrium (# 2 = 179, 2 d.f., see Table 2). 

In Table 3, we compare the HRR's for DR3 and DR4 to the RR's calculated using a 
contrived control population ' from the 9th HLA Workshop population data referred to above. 
This population' is assumed to be in H-W equilibrium and our 'random sample' is of the same 



Haplotype relative risks 



231 



Table 3. HRRs and RRsforthe DR3 and DR4 antigens in a sample of simplex WDM patients 

(The control for the HRR's is the sample of parental haplotypes not present in the affected individuals. 
The control for the RR's was obtained by treating 1 a H-W sample assuming the antigen frequencies 
recorded for the 9th HLA workshop (Baur et al. 1984).) 



HRR 

DR 3 
+ - 

Disease 47 43 9© Disease 

control 15 58 73 control 

62 101 163 
HRR = 4 23 

p - 2 6 X IO" 5 

DR 4 
+ - 

Disease 58 32 00 Disease 

control 12 61 73 control 

70 93 163 
HRR = 921 

3? = 7-6xio~ 10 



RR 

DR3 
+ - 
47 43 90 

21 69 90 
68 112 180 

RR = 359 
p =5 3 x I0 ~ 5 

DR4 
+ - 
58 32 90 

22 68 90 
80 100 180 

RR = 560 
p = 6-8 x io" 9 



Table 4. HRR's and RR's for the DR3, 3, DR3, 4 and DR4,4 phenotypes 

(Samples are the same as those described in Table 3. In each case comparison is made relative to the 
4 base group * DRX, X to avoid the problems of non-independent risk estimates.). 





Disease 


Parental 


Workshop 


DR type 


sample 


control 


control 


DR3, 3 


6 


0 


i-3 


DR3, 4 


25 


2 


28 


DR4, 4 


4 


0 


i*S 


DR3, X 


16 


13 


i6'4 


DR4, X 


29 


10 


IT4 


DRX, X 


10 


48 


50'5 


Total 


00 


73 


899 


HRR 




RR 




HRR(3, 4) == 


600 


RR(3, 4) 


= 45i 


HRR(3, 3) = 


undefined 


RR(3, 3) 


= 233 


HRR(4, 4) = 


undefined 


RR(4, 4) 


= 135 



If 'expected values' are substituted for the zero observations in the parental control, one gets: 

HRR'(3, 3) = 37 4, 
HRR'(4, 4) = 39"2. 



size as our disease sample (i.e. 90 individuals). Table 4 gives HRR's and RR's for the three DR 
phenotypes DR3, 3, DR3, 4, and DR4, 4 using the same samples. Here the risks are compared 
to the baseline phenotype DRX, X in each case since the risks are not independent (cf. 
Curie-Cohen, 1981, Svejgaard & Ryder, 1981). Note that the HRR's for DR3, 3 and DR4, 4 
are undefined since there are no 'individuals' with those phenotypes in the control sample of 
73. If expected values are substituted for the ' zero ' values in those cases HRR's can be estimated 
as given at the bottom of Table 4, but the use of such estimates must be made with caution. 



16 



HOB 51 



232 



C. T. Falk and P. Rubinstein 



DISCUSSION 

One of the major problems inherent in proper calculations of relative risks (RR's) is that of 
choosing an appropriate control. A basic assumption in the use of RR's is that both the affected 
sample and the control sample are chosen at random from the same genetically homogeneous 
random mating population with no selection criteria except for the disease status required for 
inclusion in the affected sample. In practice this is a difficult criterion to fulfil. Additionally, 
it adds a significant amount of work to select and test such a control sample. It is therefore 
often assumed that the control sample is simply a hypothetical sample created from a population 
thought to be similar to that of the disease sample and 'generated 1 from that population by 
assuming H-W equilibrium and some reasonable sample size (cf. Svejgaard & Ryder, 1981, and 
our Contrived * sample of the previous section). 

Given the known heterogeneity of current urban populations, even within the less hetero- 
geneous European countries, use of population control data culled, for example, from HLA 
workshop surveys, may alter the significance of calculated RR's. Although, in the examples 
given here the results are significant for both RR's and HRR's (Table 3), the 'p-valucs' for 
significance differ by two-fold (for DR3) and 100-fold (for DR4), with the HRR's being more 
significant in each case. If less extreme samples were tested, careless choice of the control group 
could very well make the difference between statistical significance and non-significance 
(resulting in either a type I or a type 11 error). 

Methods have previously been proposed for using sibship information to calculate * risks'. For 
example. Clarke (t%l) describes a method, attributed to C. A. B. Smith, for using sibships to 
test for a significant risk of duodenal ulcers to individuals of blood group 0. The method used 
is somewhat different from that described here in that an observed and expected probability 
of being group 0 is assigned to the propositus in each sibship where the expected value depends 
on the makeup of the sibship. The significance is then based on a comparison of pooled observed 
and expected values over a set of sibships. This method does overcome the problem of 
heterogeneity but. because of the way the test is constructed, only a small part of the data can 
be used. In Clarke's example, therefore, the associations found when using the general 
population as a control were very much decreased when using Smith's sibship method. This does 
not seem to be the case using HRR's where the associations remain strong. 

By using the two parental haplotypes not present in the single diseased individuals of the 
disease sample as the control sample 1 , we are assured of having both samples from the same 
genetic population and. as was demonstrated above, this sample should represent a random 
sample of haplotype pairs (or individuals') from that population. Care must still be taken to 
ensure that the population chosen is genetically homogeneous, to the extent possible, but the 
task of obtaining an appropriate control is simplified. 

If the disease is dominant rather than recessive, the HRR can still be used in the same way. 
Although it is not known whether the disease allele is present on the paternal haplotype ('a') 
or the maternal ( r ) or perhaps on both, the other two parental haplotypes, b and d, will still 
represent random haplotypes from the underlying population, provided that the conditions 
mentioned for the recessive case obtain. 

If ;i family is selected through more than one affected child, the situation is somewhat 
different . If the t wo affected sibs share the same two HLA haplotypes then the other two should 



Haplotype relative risks 233 

still represent random haplotypes from the population. However, if they share fewer than two 
haplotypes, the situation is more complicated. Now three (or possibly four) haplotypes are 
known to carry the disease allele in the recessive case. If the disease is dominant, it is possible, 
but not certain, that a single shared haplotype carries the disease allele. If no haplotype is 
shared, it is not possible to define disease-carrying haplotypes with certainty. In such cases it 
would therefore be difficult to define a control sample of random haplotypes meeting the 
necessary criteria. 

Two other points should be emphasized. If there is differential selection between genotypes 
at the susceptibility locus, (e.g. reduced fertility) a bias might be introduced such that the 
control haplotypes could no longer be considered a random population sample. Thus we require 
compliance with assumption (3) of our model to ensure the proper distribution of susceptibility 
alleles in the ' control ' haplotypes. 

Further, if the population from which the sample is drawn is genetically heterogeneous with 
respect to the disease, the HRR as well as the RR may be difficult to interpret as well as to 
use. In an extreme case a population might be made up of two ethnically distinct subpopulations 
that do not interbreed. Assume that the disease of interest occurs in only one of two such 
subpopulations. An estimate of the HRR would come entirely from a sample taken from the 
subpopulation where the disease is present and would be relevant only to that population 
(individuals in the other group having no risk, by definition). On the other hand, the RR would 
assign a risk over the entire population that would be too low for individuals in the susceptible 
part of the population and too high for individuals in the non-susceptible part. 

We wish to thank Drs Jurg Ott, Neil Risch and C. A. B. Smith for helpful and constructive comments on 
an earlier draft of this paper. 
This work was supported by NIH grant GM291 77. 

REFERENCES 

Baur, M. P., Neugebauer, M. & Albert, E. D. (1984). Reference tables of two-locus haplotype frequencies 

for all MHC marker loci. In Histocompatibility Testing (eds. E. D. Albert, M. P. Baur and W. R. Mayr), pp. 

677-755. Berlin : Springer-Verlag. . 
Bertrams, J. & Baur, M. P. (1984). Insulin -dependent diabetes mellitus. In Histocompatibility Testing (eds. 

E. D. Albert, M. P. Baur and W. R. Mayr), pp. 348-358. Berlin: Springer-Verlag. 
Clarke, C. A. (1961). Blood Groups and Disease. Progress in Medical Genetics 1, 81-119. 
Curie-Cohen, M. (1981). HLA antigens and susceptibility to juvenile diabetes: do additive relative 

risks imply genetic heterogeneity ? Tissue Antigens 17, 136-148. 
Falk, C. T., Mendell, N. R. & Rubinstein, P. (1983). Effect of population associations and reduced 

jwnetrance on observed and expected genotype frequencies in a simple genetic model: application to HLA 

and insulin dependent diabetes mellitus. Ann. Hum. Genet. 47, 161-165. 
Falk, C. T. (1 984). A two-susceptibility-allele model for genetic diseases and associated marker loci : differences 

and similarities to a one-s-allele model. Ann. Hum. Genet. 48, 87-95. 
Rubinstein, P., Walker, M., Carpenter, C, Carrier, C, Krassner, J., Falk, C. & Ginsberg, F. (1981). 

Genetics of HLA disease associations. The use of the haplotype relative risk (HRR) and the "haplo-delta" 

(Dh) estimates in juvenile diabetes from three racial groups. Human Immunology 3, 384 (Abstract). 
Svejoaard, A. & Ryder, L. P. (1981). HLA genotype distribution and genetic models of insulin-dependent 

diabetes meilitus. Ann. Hum. Genet. 45, 293-298. 
Woolf, B. (1955). On estimating the relation between blood group and disease. Ann. Hum. Genet. 19, 251-253. 



16-2 



